1. Motivation¶

  • What is your dataset?
    For our project we've utilized multiple datasets. The main ones we've used are 'Historical Hourly Galicia Weather Dataset Recorded between 01-01-2000 and 30-04-2024', and 'Detected Wildfire Incidents and their Severity Dataset for the years between 2001-2022'. We've used these datasets for our machine learning prediction model to predict the critical threshold that might trigger a wildfire for correlated weather parameters which will be explained in the Data Analysis section in detail. We also used other datasets such as
    • 'Total pollutant gasses and particles released from wildfires happening in Galicia between 2002-2023' (to showcase the environmental impact of wildfires)
    • 'Ecologically Irreplaceable Areas of Galicia' (indicates zones to be protected against any harm)
    • 'High Wildfire Risk Zones of Galicia' (indicate zones prone to wildfires)
    • 'Predicted weather data for 01-22 May 2024' (for spotting dates when dangerous levels are obtained from the developed ML model which are reached for the wildfire-correlated weather parameters which might trigger a wildfire for a future time period).

Finally, we got some datasets for showcasing regional tree cover extent, fire alert count and forest loss due to wildfires to inform the audience.

  • Why did you choose this/these particular dataset(s)?
    Reasons for each dataset can be described as below:
  1. Historical Hourly Galicia Weather Dataset Recorded between 01-01-2000 and 30-04-2024 :
    The relationship between some weather parameters and wildfires are widely researched academically for the prediction of wildfires. For instance; There is a rule of thumb called 30-30-30 rule for the temperature, humidity and wind speed parameters which can be used for fire prediction as it is stated in the source. (Aug 8, 2018. How the 30-30-30 Crossover Rule affects the threat of a wildfire sparking https://www.kelownanow.com/watercooler/news/news/Okanagan/%20How_the_30_30_30_Crossover_Rule_affects_the_threat_of_a_wildfire_sparking/#fs_136857)
    Therefore we wanted to investigate some weather parameters which we think that are correlated with the wildfire indicents. The possible effects of some weather parameters we have in our dataset is explained as below:
    • High temperatures can dry out vegetation, making it more susceptible to ignition and increasing the likelihood of fires.
    • Low humidity levels can dry out vegetation, making it more flammable and contributing to the rapid spread of fires.
    • Evapotranspiration parameter indicates the amount of water lost from the soil and vegetation, affecting fuel moisture content and fire risk.
    • Vapour Pressure Deficit measures the difference between the amount of moisture in the air and the maximum amount of moisture the air can hold, influencing vegetation dryness and fire behavior.
    • High wind speeds can accelerate the spread of fires by carrying embers and flames, making containment efforts more challenging.
    • Elevated soil temperatures can dry out vegetation and contribute to the overall flammability of the environment.
    • Low soil moisture levels can lead to drier vegetation, increasing the likelihood of fires and their intensity.
    • Direct Normal Solar Irradiance can dry out vegetation and contribute to the overall fire risk in an area.

Source for the dataset: https://open-meteo.com/

  1. Detected Wildfire Incidents and their Severity Dataset for the years between 2001-2022:
    When working with the dynamics of wildfires in Galicia region (and its relation to climate change) it is important to select the appropriate satellite data. The Moderate Resolution Imaging Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS) are two significant instruments used for remote sensing and scientific research.(Joseph M. Smith. Apr 6, 2022. VIIRS Instruments Become More Essential As Terra and Aqua Drift from their Traditional Orbits. https://www.earthdata.nasa.gov/learn/articles/modis-to-viirs-transition) Our project opts for MODIS data for several reasons. One of the primary reasons is the continuity of MODIS data. MODIS has been operational since 1999, providing over two decades of consistent and reliable data on various environmental parameters, including wildfires. VIIRS on the other hand has a higher geospatial data resolution, but only became operational in 2011. Though MODIS delivers coarser data resolution (250-1000m) compared to VIIRS it remains more than adequate for our objective to detect and analyze wildfires at a regional scale. The late deployment of VIIRS falls short on the availability of long-term datasets for our historical analysis.Fire radiative power (FRP) is the most effective indicator for analyzing fire severity in comparison to e.g., brightness. It is a direct measure of the radiant heat energy released. FRP provides a quantifiable measure of the fire’s intensity, which has a close relation to the severity of the incident. Brightness provides a more subjective measure and is more prone to be affected by outside factors such as angle, atmospheric conditions etc. (Laurent, P. and Mouillot, F. and Moreno, M. V. and Yue, C. and Ciais, P. 2019. Varying relationships between fire radiative power and fire size at a global scale https://bg.copernicus.org/articles/16/275/2019/). Furthermore, FRP holds a close correlation between fire behavior, including rate of spread, fuel consumption, emissions etc. This makes FRP our best tool for assessing severity and predicting behaviour.

Source for the dataset: https://firms.modaps.eosdis.nasa.gov/

  1. Total pollutant gases and particles released from wildfires happened in Galicia between 2002-2023 dataset:
    This dataset provides detailed information on the emission factors of various species (chemical compounds and particulate matter) for wildfires. (Yongqiang Liu, Scott Goodrick, Warren Heilman, 2014Wildland fire emissions, carbon, and climate: Wildfire–climate interactions https://www.sciencedirect.com/science/article/pii/S037811271300114X) Here's a breakdown of the components and what they represent:
  • CO2 (Carbon Dioxide): The primary greenhouse gas emitted through the burning of biomass, representing the amount of carbon dioxide released per kilogram of dry matter burned.
  • CO (Carbon Monoxide): A harmful pollutant that is a byproduct of incomplete combustion, contributing to air pollution and human health issues.
  • CH4 (Methane): A potent greenhouse gas with a higher warming potential than CO2, though released in smaller quantities during biomass burning.
  • NMHC (Non-Methane Hydrocarbons): Volatile organic compounds excluding methane, contributing to ozone formation and air quality degradation.
  • H2 (Hydrogen): Released during combustion, contributing minimally to direct greenhouse gas effects but involved in atmospheric chemical reactions.
  • NOx (Nitrogen Oxides, as NO): Contributing to the formation of smog and acid rain, and affecting atmospheric chemistry and climate.
  • N2O (Nitrous Oxide): A powerful greenhouse gas with a long atmospheric lifetime, contributing to global warming and ozone layer depletion.
  • PM2.5 (Particulate Matter with diameter less than 2.5 micrometers): Fine particles that pose significant health risks due to their ability to penetrate deep into the respiratory tract.
  • TPM (Total Particulate Matter): Represents the total mass of particles emitted per kilogram of dry matter burned.
  • TPC (Total Particulate Carbon, consisting of OC+BC): The sum of organic carbon (OC) and black carbon (BC), contributing to climate change and air pollution.
  • OC (Organic Carbon): Part of particulate matter, affecting climate and air quality.
  • BC (Black Carbon): A component of fine particulate matter, significantly affecting the climate by absorbing sunlight.
  • SO2 (Sulfur Dioxide): Contributes to acid rain and has harmful health impacts.
  • NH3 (Ammonia): Affects atmospheric chemical processes and particulate matter formation.
  • DMCC (Dry Matter Carbon Content): Indicates the percentage of carbon in the dry matter, used for converting carbon emissions to the equivalent amount of dry matter burned.

Source of the dataset: https://gwis.jrc.ec.europa.eu/apps/country.profile/downloads

  1. Predicted weather data for 01-22 May 2024: We can obtain forecasted weather parameters which we found correlated with wildfires to later detect the dangerous time periods for wildfires. For the trigger zones found with ML model, we can detect the danger of fires early for a weather parameter that can trigger high fire risks.

Source of the dataset: https://open-meteo.com/en/docs

  1. Total Annual Tree Cover Loss and Annual Tree Cover Loss due to Wildfires in Galicia, Spain data:
    We use this dataaset mainly to show the increasing on the trend of tree cover loss by each year and wildfires' role for the loss of tree cover for Galicia, Spain region to inform the audiance about seriousness of wildfire issue. Not all tree cover is lost due to wildfires, shifting agriculture, forestry, intentional man-made precaution fires and urbanization factors has also effect on tree loss.

Source of the dataset: https://www.globalforestwatch.org/

  1. Mean Burned Area in ha per $Km^{2}$ Area and Mean Wildfire Incidents per $Km^{2}$ Area by Regions of Spain - [2002-2023] Data:
    This dataset is use for just showcasing why we choose to investigate Galicia over the other regions of Spain. Clearly Galicia is more prone to be damaaged harshly by wildfires when we look at the amount of burned area and high number of wildfire incidents.

Source of the dataset: https://gwis.jrc.ec.europa.eu/apps/country.profile/

  1. Total fire alerts 2001-2023 and Tree cover distributon of Galicia by its Subregions Data:
    This data is just used for showcasing the tree cover distribution and amount of recorded historical fire alerts of Galicia by its Subregions to show which subregion of Galicia has higher risk for wildfires than the others.

Source of the dataset: https://www.globalforestwatch.org/

  1. Ecologically Irreplaceible Areas of Galicia Dataset:
    This data is used to detect strategically important, ecologically irreplaceable zones inside the Galicia, to raise awareness for the reader to understand the zones need extra precaution and detection.

Source of the dataset: https://forest-fire.emergency.copernicus.eu/

  1. High Wildfire Risk Zones of Galicia Dataset:

This dataset shows the high risk the zones inside Galicia considering vegetation and wildfire modelling to raise awareness about critical zones that contain high risk in Galicia.

Source of the dataset: https://forest-fire.emergency.copernicus.eu/

  • What was your goal for the end user's experience?
  1. Raising awareness about the increasing trend of wildfire incidents on Mediterrenean Countries by focusing on Galicia, Spain and their effect on the treecover loss and release of greenhouse gasses and pollutant particles which cause irreplaceable ecological damage to environment and wildlife.
  2. Informing the audiance about the relationship of weather parameters and wildfire incidents.
  3. Suggesting a way to predict the weather conditions which might likely to trigger wildfires with machine learning modelling by predicting the danger zones for wildfire-correlated weather parameters by training the model with historical weather data and historical wildfire incidents.

2. Basic stats¶

  • Write about your choices in data cleaning and preprocessing?
  1. Merging data across different time periods.
  2. Aggregation of data to a more extended time period where it is needed to match the time parameters of two datasets to merge.
  3. Merging of datasets which contain useful features for machine learning modelling
  4. Cleaning the missing values and dropping unnecessary columns of each dataset
  5. Filtering the dataset for Galicia coordinates and detection with high confidence intervals
  6. Using only the features with correlation by doing correlation analysis
  7. Parsing the data for datetime and adding some time period indicating columns for the month of year, season, week of a year, day of a year, hour of a day, day of a week etc.
  • Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis

The description of statistics for the dataset which contain the most important features are shown in the summary of statistics tables below. The plots used to do the EDA can be observable in Data Visualizations and EDA part of this notebook later when you scroll the notebook down.


Variable Name count mean std min 25% 50% 75% max
temperature_2m (°C) 213288.00 11.73 6.06 -5.80 7.50 11.30 15.50 36.30
relative_humidity_2m (%) 213288.00 82.25 14.77 19.00 73.00 87.00 94.00 100.00
et0_fao_evapotranspiration (mm) 213288.00 0.10 0.15 0.00 0.00 0.02 0.14 0.80
vapour_pressure_deficit (kPa) 213288.00 0.32 0.42 0.00 0.07 0.16 0.41 4.23
wind_speed_10m (km/h) 213288.00 12.27 6.68 0.00 7.10 11.00 16.50 55.10
soil_temperature_0_to_7cm (°C) 213288.00 12.57 6.18 -2.40 8.00 11.90 16.70 34.40
soil_moisture_0_to_7cm (m³/m³) 213288.00 0.31 0.10 0.09 0.23 0.34 0.39 0.44
direct_normal_irradiance_instant (W/m²) 213288.00 185.55 283.67 0.00 0.00 0.00 317.80 983.90

Variable Name count mean std min 25% 50% 75% max
latitude 22277.00 42.46 0.40 41.81 42.14 42.40 42.74 43.73
longitude 22277.00 -7.81 0.71 -9.27 -8.45 -7.78 -7.18 -6.73
brightness 22277.00 325.26 22.83 300.00 309.70 319.20 333.70 505.40
scan 22277.00 1.73 0.89 1.00 1.10 1.40 2.10 4.80
track 22277.00 1.25 0.27 1.00 1.00 1.20 1.40 2.00
confidence 22277.00 73.45 23.24 0.00 59.00 77.00 95.00 100.00
bright_t31 22277.00 293.05 10.20 265.10 286.40 292.20 299.90 400.10
frp 22277.00 66.26 125.29 0.00 14.80 30.20 66.90 2956.20
type 22277.00 0.01 0.12 0.00 0.00 0.00 0.00 3.00
year 22277.00 2009.29 5.97 2001.00 2005.00 2006.00 2013.00 2022.00
season 22277.00 2.94 0.85 1.00 3.00 3.00 4.00 4.00
month 22277.00 7.26 2.48 1.00 7.00 8.00 9.00 12.00
week 22277.00 29.55 10.57 1.00 27.00 32.00 36.00 53.00
day_of_week 22277.00 4.06 2.10 1.00 2.00 4.00 6.00 7.00
hour 22277.00 16.20 5.73 3.00 12.00 14.00 23.00 24.00
day_of_month 22277.00 14.62 7.74 1.00 8.00 14.00 20.00 31.00
day_of_year 22277.00 204.34 74.30 1.00 188.00 221.00 248.00 366.00

Variable Name count mean std min 25% 50% 75% max
idprovincia 72757.00 28.26 8.02 15.00 27.00 32.00 36.00 36.00
burnt_area 72757.00 5.09 60.08 0.00 0.05 0.23 1.00 7352.14
latitude 72577.00 42.54 0.53 4.67 42.17 42.44 42.88 78.64
longitude 72575.00 -8.07 0.71 -9.43 -8.57 -8.10 -7.60 47.51
year 72757.00 2007.73 4.00 2003.00 2004.00 2006.00 2011.00 2018.00
season 72757.00 2.79 0.92 1.00 2.00 3.00 3.00 4.00
month 72757.00 6.53 2.65 1.00 4.00 7.00 8.00 12.00
week 72757.00 26.60 11.46 1.00 15.00 30.00 35.00 53.00
day_of_week 72757.00 4.14 2.02 1.00 2.00 4.00 6.00 7.00
hour 72757.00 15.98 6.14 1.00 14.00 17.00 20.00 24.00
day_of_month 72757.00 15.65 8.35 1.00 9.00 16.00 22.00 31.00
day_of_year 72757.00 183.25 80.36 1.00 104.00 207.00 243.00 366.00

Variable Name count mean std min 25% 50% 75% max
year 264.00 2012.50 6.36 2002.00 2007.00 2012.50 2018.00 2023.00
month 264.00 6.50 3.46 1.00 3.75 6.50 9.25 12.00
CO2 264.00 51182.32 249922.85 0.00 370.15 6157.90 20546.72 3546581.06
CO 264.00 2417.84 11604.11 0.00 17.43 314.05 981.43 163093.14
TPM 264.00 411.11 1980.41 0.00 2.35 50.55 177.45 27643.65
PM25 264.00 307.18 1495.35 0.00 1.98 39.04 127.99 20972.60
TPC 264.00 201.81 979.10 0.00 1.04 22.74 93.03 13548.09
NMHC 264.00 196.74 929.36 0.00 1.20 24.87 89.07 12881.31
OC 264.00 187.38 910.65 0.00 0.98 20.41 87.47 12582.94
CH4 264.00 90.63 425.13 0.00 0.61 12.09 39.37 5923.37
SO2 264.00 24.15 117.54 0.00 0.13 2.85 10.24 1640.21
BC 264.00 14.29 68.02 0.00 0.10 1.82 5.87 954.82
NOx 264.00 88.06 435.89 0.00 0.58 10.54 36.76 6250.45

3. Data Analysis¶

  • Describe your data analysis and explain what you've learned about the dataset. So with data collection, data cleaning, data analysis, data interpretation and visualization we learned correct aggregation of data, importance of removing missing values and unnecessary attributes, criticality of merging the data correctly, parsing the date and formatting the features correctly, using only the correlated features for our visualizations and ML model are critically important and with the plots we created the trends can be observed and these insight can be learned from them:
  1. From Figure 1: We understand Galicia is the most prone region for wildfires in Spain due to high number of fire incidents and most burned area.
  2. From Figure 2: We understand there is increasing trend on tree loss covers by each year in Galicia and considerable amount of it is caused by wildfires which is also increase in trend.
  3. From Figure 3 and 3.1: Ourense is the the most prone region to suffer from wildfires with the critically high number of fire alerts.
  4. From Figure 4: We understand the places where a serious wildfire incident happened in the past which burns more than 100 ha area, and which subregion that wildfire incident happened historically. We also visually can see the danger level of a historical fire incident happened in the past in Galici by looking at the diameter of circles.
  5. From Figure 5: This figure makes us understand the severity of a wild fire incident happened in Galicia by using historical Fire Radiative Power generated by wildfires. Severeness of wildfire is represented with heatmapping for us to understand.
  6. From Figure 6: We understand August has the most critical month for wildfires because it has the most total emitted CO2 Greenhouse Gas which is the main pollutant released by wildfires.
  7. From Figure 7: We learn that there are also very harmful pollutants are released to the environment with wildfires, so wildfires does not only harm environment by removing tree cover and vegetation of a zone, wildfires also pollutes the zone in many ways dramatically with other ways and August is again the most critical month with the highest value bars for released pollutants.
  8. From Figure 8: An overall vision is provided to us with Calendar Plot showing the frequency of daily wildfire incidents (2001-2022) so that we can have an overall approach which days, months, years in history are more unfortunate with relative to others and can see the wildfire likelihood pattern with this.
  9. From Figure 9: We can analyze the most correlated weather parameters with fire incidents and their positive-negative relationship with wildfires.

Then rest of the figures are showing the results of our machine learning prediction model which will be explained in the next section.

  • If relevant, talk about your machine-learning.

In our machine learning prediction model we aimed to predict the critical threshold value which creates a zone for high danger for certain weather parameters(the ones which have at least low-moderate correlation with the wildfire incidents) which can cause an environment where high likelihood of wildfires can be expected for the future dates. We aimed to use these threshold values for certain weather parameter correlated with wildfires to create red danger zones for the future weather forecast data to predict the exact date and time when a there is possibly high likelihood and high risk-danger can be expected for wildfires to happen. We can later use that predicted time and date for possible wildfire occurences for the future, for protecting irreplaceable ecological zones and areas with fire prone vegetation which are shown in choropleth mapping in the end.

4. Genre.¶


Why magazine style genre? For this project we chose to use magazine style for presenting our findings and visualizing the data. The genre offers several advantages that make it an effective medium for the message we aim to convey. Firstly, magazine style is well-suited for presenting complex information in an engaging and accessible manner, allowing us to structure the content in a curated and visually appealing format. Ease of access and low barriers to entry are important for our target audience, who may not have a technical background or the base knowledge of the subject at hand. Magazine style genre allows to present the data and information with multiple facets - both in an informative and entertaining way, making it more likely that our audience will engage with the content and take informed decisions. Secondly, magazine style is well suited for presenting a mixture of different visualizations and working with diverse narrative tools in a curated way. By incorporating images, data visualizations and texts we can visually highlight important features and guide the viewer through the narrative. The style is also well-suited for incorporating interactive elements, such as hover highlighting and filtering, which can enhance the user experience and prove additional insights for more curious and/or professional users. Thirdly, the style works well with our linear curated narrative. By structuring the content in a logical and sequential manner, we believe we effectively can convey the information. The linear narrative also allows us to use captions, headlines and introductory text to provide context and summarize key points for our curated experience. Lastly, the magazine style is an effective medium for presenting to a broad audience - which matches well with our aspiration of aiming to present the data to both locals and tourists alike that may have vastly different knowledge on the subject. The style is well recognizable and the layout is known by most improving mapping and affordance considerations. We believe we can reach a wider audience by sticking to a familiar style. This is especially true when considering our machine learning feature of future risk predictions based on meteorological prognosis, where a familiar style to existing platforms (such as weather forecasts) are important for ease of access for repeat users.
  • Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?
    Visual narrative tools:

  • Visual structuring: Consistent visual platform

    As touched upon we strive to give the users a curated experience in order to take informed decisions. This tool ensures that the visual elements in our narrative are presented in a consistent and cohesive manner making it easier for the audience to follow the narrative.

  • Highlighting: Feature distinction

    To draw attention to specific features within the narrative our main tools has been feature distinction. This is achieved through various visual elements.

  • Transition guidance: Object continuity and viewing angle

    To help the users journey through the narrative we’ve strived for consistency through object continuity. This help create a sense of continuity and supports our curated approach. Our approach of viewing angles is also to strive for consistency. The perspective we present the user are striving to be constant. By combining object continuity with viewing angle, we can create a more engaging and immersive visual narrative to guide the users attention.


  • Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?
    Narrative structure:

  • Ordering: Linear

    Our storytelling takes a linear approach through the data being sequential and chronological. This induces order and creates a clear and logical structure, effectively making it easier for the audience to follow.

  • Interactivity: Hover highlighting, filtering, navigation

    Interactivity is a key element in our storytelling, as it allows and facilitates audience engagement. In our case we used various interactivity tools to achieve this.

  • Messaging: Captions/Headlines, introductory text, summaries

    In order to target a broad audience we need to make sure key messages are conveyed in an effective manner. This fact is even more important when taking the risks of misunderstanding the wildfire data into account. In our case we strive to effectively communicate the main ideas and insights or our findings in a clear and concise way.


5. Visualizations¶

  • Explain the visualizations you've chosen. & - Why are they right for the story you want to tell?
  1. Figure 1: Bar and line chart
    Mean burned area and wildfire incidents per km2 by region in Spain(2002-2023) showing regional variations in fire severity and frequency. Number of fires incidents are shown with the line and total amount of burned area is shown with bars. On x-axis there are regions of Spain, on y axises we have burned area and number of fire incident values. This plot is used for choosing the most critical region of Spain where is very prone to wildfires.

  2. Figure 2: Stacked bar chart (but they are not added to each other, instead tree loss due to wildfire is infused to total tree loss to show what level of total tree cover is due to wildfire caused)
    Annual tree cover loss in Galicia (Green), Spain from 2001 to 2023, distinguishing between total loss and loss specifically due to wildfires (Red), highlighting years with significant wildfire impact and general trend of tree cover loss.

  3. Figure 3: Pie chart
    Distribution of tree cover across the subregions of Galicia with specific area measurements in hectares for each subregion of Galicia.

  4. Figure 3.1: Horizontal bar chart
    Total alerts by subregions of Galicia, illustrating the number of alerts reported in each subregion from highest to lowest. Ourence is shown as most critical subregion of Galicia where is prone to wildfires.

  5. Figure 4: Interactive geospatial map overlaid with a point cluster map
    Geospatial map illustrating historical wildfires in Galicia which burned more than 100 ha area. Each subregions historical wildfires are represented as different colours, and bigger the burned area of wildfire bigger the circle diameter. From there we spot exact locations where the severe fire incidents happened in the past in Galicia which might indicate high risk zones.

  6. Figure 5: Geospatial Heatmap
    Heatmap of wildfire severity in Galicia from 2001-2022, categorized by Fire Radiative Power (FRP) in megawatts, illustrating areas with low, medium, and high wildfire intensity which can be later using on detecting high risk zones.

  7. Figure 6: Interactive Radial polar barchart
    The plot displays the monthly distribution of CO2 emissions in metric tonees per kilogram of dry matter burns from wildfires in Galicia from 2002-2023, highlighting the peak emissions during the summer months.

  8. Figure 7: Interactive horizontal grouped barchart
    Horizontal bar chart showing the monthly distribution of emissions from pollutants released by wildfire incidents, detailing the total emissions in metric tons per month for pollutants like CO (seperate bar with grouped with other stacked pollutants), CH4, NOx, and others highlighting the peak emissions during July, August, September, October.

  9. Figure 8: Calendar Plot
    Indicates total detected fire incidents(0 to +50) in a day in Galicia between 2001-2022.

  10. Figure 9: Correlation matrix
    Indicates the correlation heatmap for the weather parameters that are more correlated with wildfire incidents respect other weather parameter. Selected features are temperature, humidity, precipitation, wind speed, soil temperature, soil moisture, evapotranspiration vapor pressure deficit and solar radiation.

  11. Figure 10: Boxplot
    Box plots displaying the distribution of key weather parameters confidence intervals for high likelihood of wildfire occurances according to ML model including temperature, humidity, precipitation, wind speed, soil temperature, soil moisture, evapotranspiration vapor pressure deficit and solar radiation, which are crucial in understanding wildfire risks.

  12. Figure 11: Interactive Time series Plot
    With the threshold point that might trigger high possibility for wildfire occurances we generated red zones which are dangerous zones and if a weather parameter value goes inside the red zone, we can expect that the conditions might likely to allow wildfire to happen. We put future weather forecast to understand exact time and date where a weather parameter entered red zone to use that info to predict possible wildfires for the future for a location we know its weather conditions.

  13. Figure 12 and 13: Choropleth maps The purple choropleth map shows ecologically irreplacable zones which should be protected, so they can be prioritezed for wildfire preemptive measures going to be taken. The red choropleth map shows high risk zones by condsidering vegetatiton and wildfire modelling which can be used to raise wildfire awareness on that region to take more precautions.

6. Discussion¶

  • What went well?
    We found a lot of useful open source dataset backing up our arguments and aims for our prediction model project. Because of this fact we were able to get a 87% accuracy score with the availability adn usefullness of our datasets.

  • What is still missing? What could be improved?, Why? As we mentioned before there might be several reasons for tree cover loss and wildfires such as human mistakes, arson, man-made preemptive intentional forest fires, or forestry, agricultural shifting, urbanization etc. In our wildfire dataset, we don't have a parameter to make that distinction. Therefore it affects our correlation of weather features with wildfire incidents very badly. Because only for naturally occuring climate affected wildfires, we can say something about the effect and level of some weather conditions' effect for use of prediction of location and datetime for the wildfires in future periods. With that missing attribute found, we believe that we can improve our model remarkably.

7.Contributions¶

Coding Lead: Ali Berk Gezgin Support: Nael Rashdeen, Joakim Wiben Gundersen

Website/GitHub Lead: Nael Rashdeen Support: Ali Berk Gezgin, Joakim Wiben Gundersen

Narrative (writing) Lead: Joakim Wiben Gundersen Support: Nael Rashdeen, Ali Berk Gezgin

Loading the package libraries¶

In [ ]:
import pandas as pd 
import numpy as np
import datetime
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib import cm
import seaborn as sns
from matplotlib.lines import Line2D
from matplotlib.colors import LinearSegmentedColormap
from bokeh.plotting import figure, show, output_file
from bokeh.models import ColumnDataSource, Legend, LegendItem
from bokeh.layouts import column
from bokeh.io import output_notebook
from bokeh.models import HoverTool
from bokeh.palettes import Category20  
import itertools
from mpl_toolkits.basemap import Basemap
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import calplot
import geopandas as gpd
import folium
from folium.plugins import HeatMap
import geopandas as gpd
from shapely.geometry import Point
from shapely.geometry import Polygon

Data Preprocessing & Cleaning¶

Loading of Hourly Weather dataset for Galicia for the dates between 01-01-2000 00:00 to 30-04-2024 23:00

In [ ]:
galicia_weather = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Galicia_hourly_weather_data_00_24.csv")
galicia_weather.head(5)
Out[ ]:
time temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) vapour_pressure_deficit (kPa) ... soil_moisture_7_to_28cm (m³/m³) soil_moisture_28_to_100cm (m³/m³) soil_moisture_100_to_255cm (m³/m³) is_day () sunshine_duration (s) shortwave_radiation_instant (W/m²) direct_radiation_instant (W/m²) diffuse_radiation_instant (W/m²) direct_normal_irradiance_instant (W/m²) terrestrial_radiation_instant (W/m²)
0 2000-01-01T00:00 1.9 83 -0.7 0.0 1029.0 963.6 29 0.0 0.12 ... 0.403 0.415 0.399 0 0.0 0.0 0.0 0.0 0.0 0.0
1 2000-01-01T01:00 1.3 85 -0.9 0.0 1029.2 963.6 24 0.0 0.10 ... 0.402 0.415 0.399 0 0.0 0.0 0.0 0.0 0.0 0.0
2 2000-01-01T02:00 -0.1 89 -1.7 0.0 1029.3 963.4 24 0.0 0.07 ... 0.402 0.415 0.399 0 0.0 0.0 0.0 0.0 0.0 0.0
3 2000-01-01T03:00 -1.7 92 -2.9 0.0 1028.8 962.6 16 0.0 0.05 ... 0.402 0.415 0.399 0 0.0 0.0 0.0 0.0 0.0 0.0
4 2000-01-01T04:00 -2.2 92 -3.3 0.0 1028.7 962.4 8 0.0 0.04 ... 0.402 0.414 0.399 0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 27 columns

In [ ]:
galicia_weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213288 entries, 0 to 213287
Data columns (total 27 columns):
 #   Column                                   Non-Null Count   Dtype  
---  ------                                   --------------   -----  
 0   time                                     213288 non-null  object 
 1   temperature_2m (°C)                      213288 non-null  float64
 2   relative_humidity_2m (%)                 213288 non-null  int64  
 3   dew_point_2m (°C)                        213288 non-null  float64
 4   precipitation (mm)                       213288 non-null  float64
 5   pressure_msl (hPa)                       213288 non-null  float64
 6   surface_pressure (hPa)                   213288 non-null  float64
 7   cloud_cover (%)                          213288 non-null  int64  
 8   et0_fao_evapotranspiration (mm)          213288 non-null  float64
 9   vapour_pressure_deficit (kPa)            213288 non-null  float64
 10  wind_speed_10m (km/h)                    213288 non-null  float64
 11  wind_gusts_10m (km/h)                    213288 non-null  float64
 12  soil_temperature_0_to_7cm (°C)           213288 non-null  float64
 13  soil_temperature_7_to_28cm (°C)          213288 non-null  float64
 14  soil_temperature_28_to_100cm (°C)        213288 non-null  float64
 15  soil_temperature_100_to_255cm (°C)       213288 non-null  float64
 16  soil_moisture_0_to_7cm (m³/m³)           213288 non-null  float64
 17  soil_moisture_7_to_28cm (m³/m³)          213288 non-null  float64
 18  soil_moisture_28_to_100cm (m³/m³)        213288 non-null  float64
 19  soil_moisture_100_to_255cm (m³/m³)       213288 non-null  float64
 20  is_day ()                                213288 non-null  int64  
 21  sunshine_duration (s)                    213288 non-null  float64
 22  shortwave_radiation_instant (W/m²)       213288 non-null  float64
 23  direct_radiation_instant (W/m²)          213288 non-null  float64
 24  diffuse_radiation_instant (W/m²)         213288 non-null  float64
 25  direct_normal_irradiance_instant (W/m²)  213288 non-null  float64
 26  terrestrial_radiation_instant (W/m²)     213288 non-null  float64
dtypes: float64(23), int64(3), object(1)
memory usage: 43.9+ MB

Loading the Modis fire dataset for the years between 2001-2022

In [ ]:
fire2022 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2022_Spain.csv")
fire2021 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2021_Spain.csv")
fire2020 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2020_Spain.csv")
fire2019 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2019_Spain.csv")
fire2018 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2018_Spain.csv")
fire2017 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2017_Spain.csv")
fire2016 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2016_Spain.csv")
fire2015 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2015_Spain.csv")
fire2014 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2014_Spain.csv")
fire2013 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2013_Spain.csv")
fire2012 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2012_Spain.csv")
fire2011 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2011_Spain.csv")
fire2010 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2010_Spain.csv")
fire2009 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2009_Spain.csv")
fire2008 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2008_Spain.csv")
fire2007 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2007_Spain.csv")
fire2006 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2006_Spain.csv")
fire2005 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2005_Spain.csv")
fire2004 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2004_Spain.csv")
fire2003 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2003_Spain.csv")
fire2002 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2002_Spain.csv")
fire2001 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2001_Spain.csv")
fire2000 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2000_Spain.csv")
fire2016.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time satellite instrument confidence version bright_t31 frp daynight type
0 40.4253 -1.4254 305.4 1.5 1.2 2016-01-05 1125 Terra MODIS 61 6.2 277.4 18.2 D 0
1 37.5847 -5.8172 302.1 1.1 1.0 2016-01-05 1126 Terra MODIS 48 6.2 284.7 6.9 D 0
2 38.7263 -0.7202 301.5 1.1 1.0 2016-01-05 1304 Aqua MODIS 32 6.2 286.5 5.9 D 0
3 38.7225 -0.7440 300.5 1.1 1.0 2016-01-05 1304 Aqua MODIS 22 6.2 285.9 5.2 D 0
4 38.7153 -0.7298 326.4 1.1 1.0 2016-01-05 1304 Aqua MODIS 80 6.2 286.8 27.7 D 0
In [ ]:
fire2016.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3211 entries, 0 to 3210
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   latitude    3211 non-null   float64
 1   longitude   3211 non-null   float64
 2   brightness  3211 non-null   float64
 3   scan        3211 non-null   float64
 4   track       3211 non-null   float64
 5   acq_date    3211 non-null   object 
 6   acq_time    3211 non-null   int64  
 7   satellite   3211 non-null   object 
 8   instrument  3211 non-null   object 
 9   confidence  3211 non-null   int64  
 10  version     3211 non-null   float64
 11  bright_t31  3211 non-null   float64
 12  frp         3211 non-null   float64
 13  daynight    3211 non-null   object 
 14  type        3211 non-null   int64  
dtypes: float64(8), int64(3), object(4)
memory usage: 376.4+ KB

In this parts below we merged and filtered our data for all the years and for Galicia region coordinates. After that we parsed our data and added some time period indicating columns for the month of year, season, week of a year, day of a year, hour of a day, day of a week etc. Then we dropped missing values and unnecessary columns.

In [ ]:
merged_forest_fire_incidents_galicia_2000_2022=pd.concat([fire2000,fire2001,fire2002,fire2003,fire2004,fire2005,fire2006
                                                         ,fire2007,fire2008,fire2009,fire2010,fire2011,fire2012,fire2013
                                                         ,fire2014,fire2015,fire2016,fire2017,fire2018,fire2019,fire2020
                                                         ,fire2021,fire2022], axis=0)
merged_forest_fire_incidents_galicia_2000_2022.reset_index(drop=True, inplace=True)
In [ ]:
merged_forest_fire_incidents_galicia_2000_2022.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time satellite instrument confidence version bright_t31 frp daynight type
0 43.5249 -5.7303 301.1 1.0 1.0 2000-11-01 1131 Terra MODIS 45 6.2 269.8 7.8 D 2
1 41.5184 -2.0833 312.4 1.1 1.1 2000-11-01 1132 Terra MODIS 55 6.2 280.1 15.8 D 0
2 41.3399 -2.6720 309.7 1.1 1.0 2000-11-01 1132 Terra MODIS 0 6.2 274.0 12.6 D 0
3 40.2732 -3.1756 319.2 1.1 1.0 2000-11-01 1132 Terra MODIS 79 6.2 288.3 19.9 D 0
4 40.2479 -3.4714 304.2 1.1 1.0 2000-11-01 1132 Terra MODIS 58 6.2 285.4 6.1 D 0
In [ ]:
min_longitude, max_longitude = -9.30, -6.73
min_latitude, max_latitude = 41.8, 43.8
In [ ]:
filtered_galicia_fires_00_22 = merged_forest_fire_incidents_galicia_2000_2022[
    (merged_forest_fire_incidents_galicia_2000_2022['longitude'] >= min_longitude) &
    (merged_forest_fire_incidents_galicia_2000_2022['longitude'] <= max_longitude) &
    (merged_forest_fire_incidents_galicia_2000_2022['latitude'] >= min_latitude) &
    (merged_forest_fire_incidents_galicia_2000_2022['latitude'] <= max_latitude)
]
In [ ]:
filtered_galicia_fires_00_22.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time satellite instrument confidence version bright_t31 frp daynight type
172 42.5118 -8.4374 300.4 1.1 1.0 2001-02-17 1154 Terra MODIS 36 6.2 286.7 5.6 D 0
177 42.2953 -8.2946 305.0 1.0 1.0 2001-02-19 1142 Terra MODIS 60 6.2 283.9 8.7 D 0
178 42.2688 -8.2864 311.8 1.0 1.0 2001-02-19 2248 Terra MODIS 83 6.2 275.9 16.2 N 0
186 42.2428 -6.8630 314.8 1.1 1.0 2001-02-21 1130 Terra MODIS 71 6.2 279.2 15.2 D 0
187 42.2881 -8.3451 317.1 1.2 1.1 2001-02-21 1130 Terra MODIS 77 6.2 288.3 20.2 D 0
In [ ]:
filtered_galicia_fires_00_22 = filtered_galicia_fires_00_22.drop(['satellite', 'instrument','version'], axis=1)
filtered_galicia_fires_00_22.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time confidence bright_t31 frp daynight type
172 42.5118 -8.4374 300.4 1.1 1.0 2001-02-17 1154 36 286.7 5.6 D 0
177 42.2953 -8.2946 305.0 1.0 1.0 2001-02-19 1142 60 283.9 8.7 D 0
178 42.2688 -8.2864 311.8 1.0 1.0 2001-02-19 2248 83 275.9 16.2 N 0
186 42.2428 -6.8630 314.8 1.1 1.0 2001-02-21 1130 71 279.2 15.2 D 0
187 42.2881 -8.3451 317.1 1.2 1.1 2001-02-21 1130 77 288.3 20.2 D 0

Merged and filtered fire data for galicia coordinates

In [ ]:
filtered_galicia_fires_00_22 = filtered_galicia_fires_00_22.dropna()
filtered_galicia_fires_00_22.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time confidence bright_t31 frp daynight type
172 42.5118 -8.4374 300.4 1.1 1.0 2001-02-17 1154 36 286.7 5.6 D 0
177 42.2953 -8.2946 305.0 1.0 1.0 2001-02-19 1142 60 283.9 8.7 D 0
178 42.2688 -8.2864 311.8 1.0 1.0 2001-02-19 2248 83 275.9 16.2 N 0
186 42.2428 -6.8630 314.8 1.1 1.0 2001-02-21 1130 71 279.2 15.2 D 0
187 42.2881 -8.3451 317.1 1.2 1.1 2001-02-21 1130 77 288.3 20.2 D 0
In [ ]:
filtered_galicia_fires_00_22.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 22277 entries, 172 to 100087
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   latitude    22277 non-null  float64
 1   longitude   22277 non-null  float64
 2   brightness  22277 non-null  float64
 3   scan        22277 non-null  float64
 4   track       22277 non-null  float64
 5   acq_date    22277 non-null  object 
 6   acq_time    22277 non-null  int64  
 7   confidence  22277 non-null  int64  
 8   bright_t31  22277 non-null  float64
 9   frp         22277 non-null  float64
 10  daynight    22277 non-null  object 
 11  type        22277 non-null  int64  
dtypes: float64(7), int64(3), object(2)
memory usage: 2.2+ MB

Parsing of weather and fire dataset to add column indicate month year day of week etc.

In [ ]:
galicia_weather['time'] = pd.to_datetime(galicia_weather['time'], format='%Y-%m-%dT%H:%M')

galicia_weather['year'] = galicia_weather['time'].dt.year

# Defining a function to assign seasons
def get_season(month):
    if month in [12, 1, 2]:
        return 1  # Winter
    elif month in [3, 4, 5]:
        return 2  # Spring
    elif month in [6, 7, 8]:
        return 3  # Summer
    else:
        return 4  # Autumn

# Applying the function to the data
galicia_weather['season'] = galicia_weather['time'].dt.month.apply(get_season)

# Extracting the month
galicia_weather['month'] = galicia_weather['time'].dt.month

# Extracting the week of the year
galicia_weather['week'] = galicia_weather['time'].dt.isocalendar().week

# Extracting the day of the week (1 = Monday, 7 = Sunday)
galicia_weather['day_of_week'] = galicia_weather['time'].dt.dayofweek + 1

# Extracting the hour (24-hour format)
galicia_weather['hour'] = galicia_weather['time'].dt.hour + 1

# Extracting the day of the month
galicia_weather['day_of_month'] = galicia_weather['time'].dt.day

# Extracting the day of the year
galicia_weather['day_of_year'] = galicia_weather['time'].dt.dayofyear

galicia_weather.head(5)
Out[ ]:
time temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) vapour_pressure_deficit (kPa) ... direct_normal_irradiance_instant (W/m²) terrestrial_radiation_instant (W/m²) year season month week day_of_week hour day_of_month day_of_year
0 2000-01-01 00:00:00 1.9 83 -0.7 0.0 1029.0 963.6 29 0.0 0.12 ... 0.0 0.0 2000 1 1 52 6 1 1 1
1 2000-01-01 01:00:00 1.3 85 -0.9 0.0 1029.2 963.6 24 0.0 0.10 ... 0.0 0.0 2000 1 1 52 6 2 1 1
2 2000-01-01 02:00:00 -0.1 89 -1.7 0.0 1029.3 963.4 24 0.0 0.07 ... 0.0 0.0 2000 1 1 52 6 3 1 1
3 2000-01-01 03:00:00 -1.7 92 -2.9 0.0 1028.8 962.6 16 0.0 0.05 ... 0.0 0.0 2000 1 1 52 6 4 1 1
4 2000-01-01 04:00:00 -2.2 92 -3.3 0.0 1028.7 962.4 8 0.0 0.04 ... 0.0 0.0 2000 1 1 52 6 5 1 1

5 rows × 35 columns

In [ ]:
galicia_weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213288 entries, 0 to 213287
Data columns (total 35 columns):
 #   Column                                   Non-Null Count   Dtype         
---  ------                                   --------------   -----         
 0   time                                     213288 non-null  datetime64[ns]
 1   temperature_2m (°C)                      213288 non-null  float64       
 2   relative_humidity_2m (%)                 213288 non-null  int64         
 3   dew_point_2m (°C)                        213288 non-null  float64       
 4   precipitation (mm)                       213288 non-null  float64       
 5   pressure_msl (hPa)                       213288 non-null  float64       
 6   surface_pressure (hPa)                   213288 non-null  float64       
 7   cloud_cover (%)                          213288 non-null  int64         
 8   et0_fao_evapotranspiration (mm)          213288 non-null  float64       
 9   vapour_pressure_deficit (kPa)            213288 non-null  float64       
 10  wind_speed_10m (km/h)                    213288 non-null  float64       
 11  wind_gusts_10m (km/h)                    213288 non-null  float64       
 12  soil_temperature_0_to_7cm (°C)           213288 non-null  float64       
 13  soil_temperature_7_to_28cm (°C)          213288 non-null  float64       
 14  soil_temperature_28_to_100cm (°C)        213288 non-null  float64       
 15  soil_temperature_100_to_255cm (°C)       213288 non-null  float64       
 16  soil_moisture_0_to_7cm (m³/m³)           213288 non-null  float64       
 17  soil_moisture_7_to_28cm (m³/m³)          213288 non-null  float64       
 18  soil_moisture_28_to_100cm (m³/m³)        213288 non-null  float64       
 19  soil_moisture_100_to_255cm (m³/m³)       213288 non-null  float64       
 20  is_day ()                                213288 non-null  int64         
 21  sunshine_duration (s)                    213288 non-null  float64       
 22  shortwave_radiation_instant (W/m²)       213288 non-null  float64       
 23  direct_radiation_instant (W/m²)          213288 non-null  float64       
 24  diffuse_radiation_instant (W/m²)         213288 non-null  float64       
 25  direct_normal_irradiance_instant (W/m²)  213288 non-null  float64       
 26  terrestrial_radiation_instant (W/m²)     213288 non-null  float64       
 27  year                                     213288 non-null  int64         
 28  season                                   213288 non-null  int64         
 29  month                                    213288 non-null  int64         
 30  week                                     213288 non-null  UInt32        
 31  day_of_week                              213288 non-null  int64         
 32  hour                                     213288 non-null  int64         
 33  day_of_month                             213288 non-null  int64         
 34  day_of_year                              213288 non-null  int64         
dtypes: UInt32(1), datetime64[ns](1), float64(23), int64(10)
memory usage: 56.3 MB
In [ ]:
# Converting 'acq_date' to a datetime object
filtered_galicia_fires_00_22['acq_date'] = pd.to_datetime(filtered_galicia_fires_00_22['acq_date'], format='%Y-%m-%d')

# Converting 'acq_time' to hh:mm format and then to a time object
filtered_galicia_fires_00_22['acq_time'] = filtered_galicia_fires_00_22['acq_time'].apply(lambda x: pd.to_datetime(x, format='%H%M').time())

# Combining 'acq_date' and 'acq_time' into a single datetime column
filtered_galicia_fires_00_22['datetime'] = filtered_galicia_fires_00_22.apply(lambda row: pd.Timestamp.combine(row['acq_date'], row['acq_time']), axis=1)

# Extracting the year
filtered_galicia_fires_00_22['year'] = filtered_galicia_fires_00_22['datetime'].dt.year

# Defining a function to assign seasons
def get_season(month):
    if month in [12, 1, 2]:
        return 1  # Winter
    elif month in [3, 4, 5]:
        return 2  # Spring
    elif month in [6, 7, 8]:
        return 3  # Summer
    else:
        return 4  # Autumn

# Applying the function to the DataFrame
filtered_galicia_fires_00_22['season'] = filtered_galicia_fires_00_22['datetime'].dt.month.apply(get_season)

# Extracting the month
filtered_galicia_fires_00_22['month'] = filtered_galicia_fires_00_22['datetime'].dt.month

# Extracting the week of the year
filtered_galicia_fires_00_22['week'] = filtered_galicia_fires_00_22['datetime'].dt.isocalendar().week

# Extracting the day of the week (1 = Monday, 7 = Sunday)
filtered_galicia_fires_00_22['day_of_week'] = filtered_galicia_fires_00_22['datetime'].dt.dayofweek + 1

# Extracting the hour (24-hour format)
filtered_galicia_fires_00_22['hour'] = filtered_galicia_fires_00_22['datetime'].dt.hour + 1

# Extracting the day of the month
filtered_galicia_fires_00_22['day_of_month'] = filtered_galicia_fires_00_22['datetime'].dt.day

# Extracting the day of the year
filtered_galicia_fires_00_22['day_of_year'] = filtered_galicia_fires_00_22['datetime'].dt.dayofyear

filtered_galicia_fires_00_22.head(5)
Out[ ]:
latitude longitude brightness scan track acq_date acq_time confidence bright_t31 frp ... type datetime year season month week day_of_week hour day_of_month day_of_year
172 42.5118 -8.4374 300.4 1.1 1.0 2001-02-17 11:54:00 36 286.7 5.6 ... 0 2001-02-17 11:54:00 2001 1 2 7 6 12 17 48
177 42.2953 -8.2946 305.0 1.0 1.0 2001-02-19 11:42:00 60 283.9 8.7 ... 0 2001-02-19 11:42:00 2001 1 2 8 1 12 19 50
178 42.2688 -8.2864 311.8 1.0 1.0 2001-02-19 22:48:00 83 275.9 16.2 ... 0 2001-02-19 22:48:00 2001 1 2 8 1 23 19 50
186 42.2428 -6.8630 314.8 1.1 1.0 2001-02-21 11:30:00 71 279.2 15.2 ... 0 2001-02-21 11:30:00 2001 1 2 8 3 12 21 52
187 42.2881 -8.3451 317.1 1.2 1.1 2001-02-21 11:30:00 77 288.3 20.2 ... 0 2001-02-21 11:30:00 2001 1 2 8 3 12 21 52

5 rows × 21 columns

In [ ]:
filtered_galicia_fires_00_22.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 22277 entries, 172 to 100087
Data columns (total 21 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   latitude      22277 non-null  float64       
 1   longitude     22277 non-null  float64       
 2   brightness    22277 non-null  float64       
 3   scan          22277 non-null  float64       
 4   track         22277 non-null  float64       
 5   acq_date      22277 non-null  datetime64[ns]
 6   acq_time      22277 non-null  object        
 7   confidence    22277 non-null  int64         
 8   bright_t31    22277 non-null  float64       
 9   frp           22277 non-null  float64       
 10  daynight      22277 non-null  object        
 11  type          22277 non-null  int64         
 12  datetime      22277 non-null  datetime64[ns]
 13  year          22277 non-null  int64         
 14  season        22277 non-null  int64         
 15  month         22277 non-null  int64         
 16  week          22277 non-null  UInt32        
 17  day_of_week   22277 non-null  int64         
 18  hour          22277 non-null  int64         
 19  day_of_month  22277 non-null  int64         
 20  day_of_year   22277 non-null  int64         
dtypes: UInt32(1), datetime64[ns](2), float64(7), int64(9), object(2)
memory usage: 3.7+ MB
In [ ]:
frp_firedata01_22 = filtered_galicia_fires_00_22.drop(['brightness', 'scan','track','daynight', 'type','bright_t31'], axis=1)
frp_firedata01_22.head(5)
Out[ ]:
latitude longitude acq_date acq_time confidence frp datetime year season month week day_of_week hour day_of_month day_of_year
172 42.5118 -8.4374 2001-02-17 11:54:00 36 5.6 2001-02-17 11:54:00 2001 1 2 7 6 12 17 48
177 42.2953 -8.2946 2001-02-19 11:42:00 60 8.7 2001-02-19 11:42:00 2001 1 2 8 1 12 19 50
178 42.2688 -8.2864 2001-02-19 22:48:00 83 16.2 2001-02-19 22:48:00 2001 1 2 8 1 23 19 50
186 42.2428 -6.8630 2001-02-21 11:30:00 71 15.2 2001-02-21 11:30:00 2001 1 2 8 3 12 21 52
187 42.2881 -8.3451 2001-02-21 11:30:00 77 20.2 2001-02-21 11:30:00 2001 1 2 8 3 12 21 52
In [ ]:
frp_firedata01_22.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 22277 entries, 172 to 100087
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   latitude      22277 non-null  float64       
 1   longitude     22277 non-null  float64       
 2   acq_date      22277 non-null  datetime64[ns]
 3   acq_time      22277 non-null  object        
 4   confidence    22277 non-null  int64         
 5   frp           22277 non-null  float64       
 6   datetime      22277 non-null  datetime64[ns]
 7   year          22277 non-null  int64         
 8   season        22277 non-null  int64         
 9   month         22277 non-null  int64         
 10  week          22277 non-null  UInt32        
 11  day_of_week   22277 non-null  int64         
 12  hour          22277 non-null  int64         
 13  day_of_month  22277 non-null  int64         
 14  day_of_year   22277 non-null  int64         
dtypes: UInt32(1), datetime64[ns](2), float64(3), int64(8), object(1)
memory usage: 2.7+ MB

Total pollutant green house gases released dataset btw 2002-2023

In [ ]:
galicia_total_pollutant = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\emission_gfed_full_2002_2023.csv")
galicia_total_pollutant = galicia_total_pollutant.query("country == 'Spain' and region == 'Galicia'")
galicia_total_pollutant = galicia_total_pollutant.drop(['gid_0', 'country','gid_1'], axis=1)
galicia_total_pollutant.head(5)
Out[ ]:
year month region CO2 CO TPM PM25 TPC NMHC OC CH4 SO2 BC NOx
858 2002 1 Galicia 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
4468 2002 2 Galicia 22535.052 1202.291 228.750 165.325 125.476 112.970 118.445 47.629 13.854 7.024 29.088
8078 2002 3 Galicia 34622.885 1515.237 238.283 183.307 106.545 110.338 97.419 54.264 13.718 8.995 67.585
11688 2002 4 Galicia 80636.228 3897.548 672.776 498.031 335.511 326.068 312.258 148.580 39.561 23.075 133.829
15298 2002 5 Galicia 1879.408 70.227 9.475 7.992 3.344 3.790 2.921 2.163 0.535 0.412 4.347
In [ ]:
galicia_total_pollutant.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 264 entries, 858 to 950288
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    264 non-null    int64  
 1   month   264 non-null    int64  
 2   region  264 non-null    object 
 3   CO2     264 non-null    float64
 4   CO      264 non-null    float64
 5   TPM     264 non-null    float64
 6   PM25    264 non-null    float64
 7   TPC     264 non-null    float64
 8   NMHC    264 non-null    float64
 9   OC      264 non-null    float64
 10  CH4     264 non-null    float64
 11  SO2     264 non-null    float64
 12  BC      264 non-null    float64
 13  NOx     264 non-null    float64
dtypes: float64(11), int64(2), object(1)
memory usage: 30.9+ KB

Fire data shows burnt area dataset btw 2003-2018 and its parsing

In [ ]:
galicia_burned_area_byfires_03_18 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\galicia_burned_area_bywildfires_03_18.csv")
galicia_burned_area_byfires_03_18 = galicia_burned_area_byfires_03_18.drop(['numeroparte', 'idcomunidad'], axis=1)
galicia_burned_area_byfires_03_18.head(5)
Out[ ]:
deteccion idprovincia burnt_area latitude longitude
0 2003-01-15 18:30:00 15 0.50 43.501195 -8.012159
1 2003-01-16 20:10:00 15 1.50 43.501195 -8.012159
2 2003-01-17 08:50:00 15 2.05 42.988479 -9.238336
3 2003-01-28 21:40:00 15 0.35 42.709977 -8.787082
4 2003-02-13 13:55:00 15 0.01 43.520902 -8.189201
In [ ]:
# Converting 'deteccion' to datetime object
galicia_burned_area_byfires_03_18['deteccion'] = pd.to_datetime(galicia_burned_area_byfires_03_18['deteccion'])

# Extracting the year
galicia_burned_area_byfires_03_18['year'] = galicia_burned_area_byfires_03_18['deteccion'].dt.year

# Defining a function to assign seasons
def get_season(month):
    if month in [12, 1, 2]:
        return 1  # Winter
    elif month in [3, 4, 5]:
        return 2  # Spring
    elif month in [6, 7, 8]:
        return 3  # Summer
    else:
        return 4  # Autumn

# Applying the function to the DataFrame
galicia_burned_area_byfires_03_18['season'] = galicia_burned_area_byfires_03_18['deteccion'].dt.month.apply(get_season)

# Extracting the month
galicia_burned_area_byfires_03_18['month'] = galicia_burned_area_byfires_03_18['deteccion'].dt.month

# Extracting the week of the year
galicia_burned_area_byfires_03_18['week'] = galicia_burned_area_byfires_03_18['deteccion'].dt.isocalendar().week

# Extracting the day of the week (1 = Monday, 7 = Sunday)
galicia_burned_area_byfires_03_18['day_of_week'] = galicia_burned_area_byfires_03_18['deteccion'].dt.dayofweek + 1

# Extracting the hour (24-hour format)
galicia_burned_area_byfires_03_18['hour'] = galicia_burned_area_byfires_03_18['deteccion'].dt.hour + 1

# Extracting the day of the month
galicia_burned_area_byfires_03_18['day_of_month'] = galicia_burned_area_byfires_03_18['deteccion'].dt.day

# Extracting the day of the year
galicia_burned_area_byfires_03_18['day_of_year'] = galicia_burned_area_byfires_03_18['deteccion'].dt.dayofyear

galicia_burned_area_byfires_03_18.head(5)
Out[ ]:
deteccion idprovincia burnt_area latitude longitude year season month week day_of_week hour day_of_month day_of_year
0 2003-01-15 18:30:00 15 0.50 43.501195 -8.012159 2003 1 1 3 3 19 15 15
1 2003-01-16 20:10:00 15 1.50 43.501195 -8.012159 2003 1 1 3 4 21 16 16
2 2003-01-17 08:50:00 15 2.05 42.988479 -9.238336 2003 1 1 3 5 9 17 17
3 2003-01-28 21:40:00 15 0.35 42.709977 -8.787082 2003 1 1 5 2 22 28 28
4 2003-02-13 13:55:00 15 0.01 43.520902 -8.189201 2003 1 2 7 4 14 13 44

Why we choose galicia dataset for line and bar plot regions of spain

In [ ]:
spain_avgburnedarea_avgfires_byregion_02_23 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Avg. Burned Area (ha) divided by Region Area (Km2) and Avg. Nr. of Fires  Region Area (Km2) - [2002-2023].csv")
spain_avgburnedarea_avgfires_byregion_02_23.head(5)
Out[ ]:
Region Burned Area Nr. of Fires
0 Andalucía 0.238 0.001
1 Aragón 0.084 0.000
2 Cantabria 0.237 0.002
3 Castilla y León 0.220 0.001
4 Castilla-La Mancha 0.078 0.000
In [ ]:
spain_avgburnedarea_avgfires_byregion_02_23.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Region        18 non-null     object 
 1   Burned Area   18 non-null     float64
 2   Nr. of Fires  18 non-null     float64
dtypes: float64(2), object(1)
memory usage: 560.0+ bytes

data for showing how much of forest loss due to fires

In [ ]:
spain_yearly_treecoverloss_byfires_01_23 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Spain_treecoverloss_yearly_01_23.csv")
spain_yearly_treecoverloss_byfires_01_23 = spain_yearly_treecoverloss_byfires_01_23.drop(['iso', 'adm1'], axis=1)
spain_yearly_treecoverloss_byfires_01_23.head(5)
Out[ ]:
umd_tree_cover_loss__year umd_tree_cover_loss__ha umd_tree_cover_loss_from_fires__ha
0 2001 8700.494893 1039.163056
1 2002 10416.597912 2271.879901
2 2003 4315.377146 504.531020
3 2004 15337.191094 3345.148959
4 2005 10222.512235 1925.369063
In [ ]:
spain_yearly_treecoverloss_byfires_01_23.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 3 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   umd_tree_cover_loss__year           23 non-null     int64  
 1   umd_tree_cover_loss__ha             23 non-null     float64
 2   umd_tree_cover_loss_from_fires__ha  23 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 680.0 bytes
In [ ]:
irre_data = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\irreplacibility.csv")
fire_data = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_risk.csv")
In [ ]:
firealerts_subregions = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\firealerts_subregion_galicia.csv")
treecover_subregions = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\treecover_subregion_galicia.csv")
In [ ]:
selected_features = ['temperature_2m (°C)',
    'relative_humidity_2m (%)',
    'et0_fao_evapotranspiration (mm)',
    'vapour_pressure_deficit (kPa)',
    'wind_speed_10m (km/h)',
    'soil_temperature_0_to_7cm (°C)',
    'soil_moisture_0_to_7cm (m³/m³)',
    'direct_normal_irradiance_instant (W/m²)']

galicia_weather[selected_features].describe().round(2)
Out[ ]:
temperature_2m (°C) relative_humidity_2m (%) et0_fao_evapotranspiration (mm) vapour_pressure_deficit (kPa) wind_speed_10m (km/h) soil_temperature_0_to_7cm (°C) soil_moisture_0_to_7cm (m³/m³) direct_normal_irradiance_instant (W/m²)
count 213288.00 213288.00 213288.00 213288.00 213288.00 213288.00 213288.00 213288.00
mean 11.73 82.25 0.10 0.32 12.27 12.57 0.31 185.55
std 6.06 14.77 0.15 0.42 6.68 6.18 0.10 283.67
min -5.80 19.00 0.00 0.00 0.00 -2.40 0.09 0.00
25% 7.50 73.00 0.00 0.07 7.10 8.00 0.23 0.00
50% 11.30 87.00 0.02 0.16 11.00 11.90 0.34 0.00
75% 15.50 94.00 0.14 0.41 16.50 16.70 0.39 317.80
max 36.30 100.00 0.80 4.23 55.10 34.40 0.44 983.90
In [ ]:
filtered_galicia_fires_00_22.describe().round(2)
Out[ ]:
latitude longitude brightness scan track confidence bright_t31 frp type year season month week day_of_week hour day_of_month day_of_year
count 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00 22277.00
mean 42.46 -7.81 325.26 1.73 1.25 73.45 293.05 66.26 0.01 2009.29 2.94 7.26 29.55 4.06 16.20 14.62 204.34
std 0.40 0.71 22.83 0.89 0.27 23.24 10.20 125.29 0.12 5.97 0.85 2.48 10.57 2.10 5.73 7.74 74.30
min 41.81 -9.27 300.00 1.00 1.00 0.00 265.10 0.00 0.00 2001.00 1.00 1.00 1.00 1.00 3.00 1.00 1.00
25% 42.14 -8.45 309.70 1.10 1.00 59.00 286.40 14.80 0.00 2005.00 3.00 7.00 27.00 2.00 12.00 8.00 188.00
50% 42.40 -7.78 319.20 1.40 1.20 77.00 292.20 30.20 0.00 2006.00 3.00 8.00 32.00 4.00 14.00 14.00 221.00
75% 42.74 -7.18 333.70 2.10 1.40 95.00 299.90 66.90 0.00 2013.00 4.00 9.00 36.00 6.00 23.00 20.00 248.00
max 43.73 -6.73 505.40 4.80 2.00 100.00 400.10 2956.20 3.00 2022.00 4.00 12.00 53.00 7.00 24.00 31.00 366.00
In [ ]:
galicia_burned_area_byfires_03_18.describe().round(2)
Out[ ]:
idprovincia burnt_area latitude longitude year season month week day_of_week hour day_of_month day_of_year
count 72757.00 72757.00 72575.00 72575.00 72757.00 72757.00 72757.00 72757.00 72757.00 72757.00 72757.00 72757.00
mean 28.26 5.09 42.54 -8.07 2007.73 2.79 6.53 26.60 4.14 15.98 15.65 183.25
std 8.02 60.08 0.53 0.71 4.00 0.92 2.65 11.46 2.02 6.14 8.35 80.36
min 15.00 0.00 4.67 -9.43 2003.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
25% 27.00 0.05 42.17 -8.57 2004.00 2.00 4.00 15.00 2.00 14.00 9.00 104.00
50% 32.00 0.23 42.44 -8.10 2006.00 3.00 7.00 30.00 4.00 17.00 16.00 207.00
75% 36.00 1.00 42.88 -7.60 2011.00 3.00 8.00 35.00 6.00 20.00 22.00 243.00
max 36.00 7352.14 78.64 47.51 2018.00 4.00 12.00 53.00 7.00 24.00 31.00 366.00
In [ ]:
galicia_total_pollutant.describe().round(2)
Out[ ]:
year month CO2 CO TPM PM25 TPC NMHC OC CH4 SO2 BC NOx
count 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00 264.00
mean 2012.50 6.50 51182.32 2417.84 411.11 307.18 201.81 196.74 187.38 90.63 24.15 14.29 88.06
std 6.36 3.46 249922.85 11604.11 1980.41 1495.35 979.10 929.36 910.65 425.13 117.54 68.02 435.89
min 2002.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 2007.00 3.75 370.15 17.43 2.35 1.98 1.04 1.20 0.98 0.61 0.13 0.10 0.58
50% 2012.50 6.50 6157.90 314.05 50.55 39.04 22.74 24.87 20.41 12.09 2.85 1.82 10.54
75% 2018.00 9.25 20546.72 981.43 177.45 127.99 93.03 89.07 87.47 39.37 10.24 5.87 36.76
max 2023.00 12.00 3546581.06 163093.14 27643.65 20972.60 13548.09 12881.31 12582.94 5923.37 1640.21 954.82 6250.45

Data to csv export use it for your convenience

In [ ]:
#datafilename.to_csv('data.csv', index=False)

2 - Data Visualizations and EDA¶

In [ ]:
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))

# Plotting the bar chart
ax1.bar(spain_avgburnedarea_avgfires_byregion_02_23['Region'], spain_avgburnedarea_avgfires_byregion_02_23['Burned Area'], color='maroon', alpha=0.7, label='Burned Area')
ax1.set_xlabel('Regions',fontsize=13)
ax1.set_ylabel('Burned Area',fontsize=15)
ax1.set_xticklabels(spain_avgburnedarea_avgfires_byregion_02_23['Region'], rotation=45, ha='right')

# Creating a second y-axis to plot the line chart
ax2 = ax1.twinx()
ax2.plot(spain_avgburnedarea_avgfires_byregion_02_23['Region'], spain_avgburnedarea_avgfires_byregion_02_23['Nr. of Fires'], color='orangered', marker='o', label='Nr. of Fires')
ax2.set_ylabel('Nr. of Fires',fontsize=15)

# Adding title and legend
plt.title('Mean Burned Area in ha per $Km^{2}$ Area and Mean Wildfire Incidents per $Km^{2}$ Area by Regions of Spain - [2002-2023]')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')

# Showing plot
plt.tight_layout()
plt.show()
C:\Users\45502\AppData\Local\Temp\ipykernel_43264\1673992272.py:8: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax1.set_xticklabels(spain_avgburnedarea_avgfires_byregion_02_23['Region'], rotation=45, ha='right')
No description has been provided for this image

by burned area galicia is most critical region when we consider burnt area due to wildfires happened in spain between 2002-2023. Source:https://gwis.jrc.ec.europa.eu/apps/country.profile/downloads

In [ ]:
# Plot
fig, ax = plt.subplots(figsize=(12, 8))

# Plotting the total loss bars
ax.bar(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__ha'], label='Total Annual Tree Cover Loss for the Related Year', color='darkolivegreen')

# Plotting the loss from fires as an overlay, using the same base x-coordinates
ax.bar(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss_from_fires__ha'], label='Annual Amount of Tree Cover Loss due to Wildfires happened in the Related Year', color='brown')

# Adding labels, title, and gridlines
ax.set_xlabel('Years', fontsize=14)
ax.set_ylabel('Tree Cover Loss (ha)', fontsize=15)
ax.set_title('Total Annual Tree Cover Loss and Annual Tree Cover Loss due to Wildfires in Galicia, Spain', fontsize=14)
ax.legend()
ax.grid(axis='y', linestyle='--', alpha=0.6) 

# Adjusting the x-axis ticks
ax.set_xticks(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'])
ax.set_xticklabels(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], rotation=45)

# Increasing the number of y-axis ticks
ax.yaxis.set_major_locator(plt.MaxNLocator(10))  

# Display the plot
plt.tight_layout()
plt.show()
No description has been provided for this image

Total amount of Tree cover loss and inside of it the amount of tree cover loss due to wildfires are represented like this. Source: https://www.globalforestwatch.org/dashboards/global/

In [ ]:
firealerts_subregions.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1012 entries, 0 to 1011
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   subregion     1012 non-null   int64
 1   alert__count  1012 non-null   int64
dtypes: int64(2)
memory usage: 15.9 KB
In [ ]:
treecover_subregions.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   subregion  4 non-null      int64  
 1   area__ha   4 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 192.0 bytes
In [ ]:
# Mapping of subregion numbers to names
subregion_map = {1: 'A Coruna', 2: 'Lugo', 3: 'Ourense', 4: 'Pontevedra'}

# Aggregating the tree cover by subregion
treecover_agg = treecover_subregions.groupby('subregion')['area__ha'].sum()

# Replacing subregion numbers with names
treecover_agg.index = treecover_agg.index.map(subregion_map)

# Function to format labels with percentage and area
def autopct_format(values):
    def my_format(pct):
        total = sum(values)
        absolute = round(pct / 100 * total)
        return f'{pct:.1f}% ({absolute} ha)'
    return my_format

# Setting the pastel colors for each subregion
colors = ['#FFB3B3',  # Pastel light red for A Coruna
          '#B3CFFF',  # Pastel light blue for Lugo
          '#D9B3FF',  # Pastel light purple for Ourense
          '#FFFFB3']  # Pastel yellow for Pontevedra

# Creating a pie chart with specified colors
fig, ax = plt.subplots(figsize=(10, 10))
ax.pie(treecover_agg, labels=treecover_agg.index, autopct=autopct_format(treecover_agg),
       startangle=140, colors=colors, textprops={'fontsize': 14}, wedgeprops={'edgecolor': 'black'})
ax.set_title('Tree Cover Distribution by Subregions of Galicia, Spain', fontsize=16)
plt.show()
No description has been provided for this image
In [ ]:
# Aggregating the alert counts by subregion
firealerts_agg = firealerts_subregions.groupby('subregion')['alert__count'].sum()

# Replacing subregion numbers with names
firealerts_agg.index = firealerts_agg.index.map(subregion_map)

# Sorting the data by alert count
firealerts_agg = firealerts_agg.sort_values(ascending=False)

# Creating a color map for each subregion
colors = ['#A52A2A', '#FF8C00', '#FFD700', '#FFE4B5']
color_map = dict(zip(firealerts_agg.index, colors))

# Creating the plot
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(firealerts_agg.index, firealerts_agg, color=[color_map[subregion] for subregion in firealerts_agg.index], edgecolor='black')

# Adding dots at the end of bars
for bar in bars:
    ax.plot(bar.get_width(), bar.get_y() + bar.get_height() / 2, 'o', color=bar.get_facecolor(), markersize=12)

# Adding alert counts at the end of the bars
for bar in bars:
    ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height() / 2,
            f'{int(bar.get_width())}', va='center', fontsize=12)

# Adjusting the plot
ax.set_xlabel('Total Alerts')
ax.set_title('Total Alerts by Subregions of Galicia, Spain')
ax.invert_yaxis() 
plt.show()
No description has been provided for this image
In [ ]:
# Filtering the data based on the criteria
filtered = galicia_burned_area_byfires_03_18[(galicia_burned_area_byfires_03_18['longitude'] < -6.73) &
                             (galicia_burned_area_byfires_03_18['longitude'] > -9.3) &
                             (galicia_burned_area_byfires_03_18['latitude'] > 41.8) &
                             (galicia_burned_area_byfires_03_18['latitude'] < 43.8) &
                             (galicia_burned_area_byfires_03_18['burnt_area'] > 100)]

# Creating a map centered around Galicia
map_galicia = folium.Map(location=[42.7, -8.015], zoom_start=8)
folium.TileLayer('cartodbdark_matter').add_to(map_galicia)

# Adding title using custom HTML
title_html = '''
     <h3 align="center" style="font-size:20px"><b>Historical Wildfires Happened in Galicia Which Burned More Than 100 ha Area</b></h3>
     '''
map_galicia.get_root().html.add_child(folium.Element(title_html))

# Defining the color mapping for each 'idprovincia'
color_map = {
    15: 'red',      # A Coruña
    27: 'blue',     # Lugo
    32: 'purple',   # Ourense
    36: 'yellow'    # Pontevedra
}

# Adding circles for wildfire incidents
for idx, row in filtered.iterrows():
    color = color_map.get(row['idprovincia'], 'blue')  
    tooltip_text = f"Date & Time: {row['deteccion']}<br>Burnt Area: {row['burnt_area']} ha"
    
    folium.Circle(
        location=[row['latitude'], row['longitude']],
        radius=row['burnt_area'] * 1,  
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.5,
        tooltip=tooltip_text
    ).add_to(map_galicia)

# Adding custom legend
legend_html = '''
     <div style="position: fixed; 
     bottom: 50px; left: 50px; width: 200px; height: 130px; 
     border:2px solid grey; z-index:9999; font-size:14px;
     background-color: white; opacity: 0.9;
     ">
     <b>Subregions of Galica</b>
     <br>
     <i style="background: red; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>A Coruña
     <br>
     <i style="background: blue; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Lugo
     <br>
     <i style="background: purple; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Ourense
     <br>
     <i style="background: yellow; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Pontevedra
     </div>
     '''
map_galicia.get_root().html.add_child(folium.Element(legend_html))
map_galicia.save(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\map_galicia.html')
# Displaying the map
map_galicia
Out[ ]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]:
# Creating a map centered on Galicia
heatmap_galicia = folium.Map(location=[42.7, -8.015], zoom_start=8)

# Preparing data for heatmap, normalize FRP by dividing by its maximum value
lat_longs = [[row['latitude'], row['longitude'], row['frp'] / filtered_galicia_fires_00_22['frp'].max()] 
             for _, row in filtered_galicia_fires_00_22.iterrows()]

# Adding HeatMap to the folium map
HeatMap(
    lat_longs, 
    radius=2.1,      # Adjust radius for heatmap circles
    blur=2.5,        # Adjust blur for smoother heatmap
    max_zoom=10000,     # Adjust for better visualization
    min_opacity=0.80 # Adjust for a clearer distinction
).add_to(heatmap_galicia)

# Adding title using custom HTML
title_html = '''
     <h3 align="center" style="font-size:20px"><b>Galician Zones Wildfire Severity Heatmap by Considering the Fire Radiative Power (FRP) in Megawatts of Historical Wildfire Incidents 2001-2022 </b></h3>
     '''
heatmap_galicia.get_root().html.add_child(folium.Element(title_html))

# Adding custom legend for the color bar
color_bar_html = '''
     <div style="position: fixed;
     bottom: 50px; left: 50px; width: 120px; height: 150px;
     border:2px solid grey; z-index:9999; font-size:14px;
     background-color:white; opacity: 0.85;">
     <b>FRP Range</b><br>
     <i style="background: #00FF00; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>Low<br>
     <i style="background: #FFFF00; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>Medium<br>
     <i style="background: #FF0000; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>High<br>
     </div>
     '''
heatmap_galicia.get_root().html.add_child(folium.Element(color_bar_html))

heatmap_galicia.save(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\heatmap_galicia.html')

# Displaying the map
heatmap_galicia
Out[ ]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]:
# Aggregating burnt_area by month
monthly_burnt_area = galicia_total_pollutant.groupby('month')['CO2'].sum().reset_index()

# Creating month names list
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

# Mapping month numbers to names
monthly_burnt_area['month_name'] = monthly_burnt_area['month'].apply(lambda x: month_names[x-1])

# Mapping months to angles in the radial plot
angle_mapping = {1: 0, 2: 30, 3: 60, 4: 90, 5: 120, 6: 150, 7: 180, 8: 210, 9: 240, 10: 270, 11: 300, 12: 330}
monthly_burnt_area['theta'] = monthly_burnt_area['month'].map(angle_mapping)

# Creating radial polar plot
fig = go.Figure()

fig.add_trace(go.Barpolar(
    r=monthly_burnt_area['CO2'],
    theta=monthly_burnt_area['theta'], 
    marker_color=[px.colors.sequential.Reds[i % len(px.colors.sequential.Reds)] for i in range(len(monthly_burnt_area))],
    marker_line_color='black',
    marker_line_width=1,
    opacity=0.8
))

# Layout adjustments
fig.update_layout(
    title={
        'text': 'Monthly Distribution of Total Emitted CO<sub>2</sub> Greenhouse Gas in Metric Tons (tonnes) of Wildfires in Galicia 2002-2023, per kilogram of dry matter burned',
        'font': {
            'size': 12  
        }
    },
    polar=dict(
        radialaxis=dict(visible=True, range=[0, monthly_burnt_area['CO2'].max() + 5]),
        angularaxis=dict(
            tickmode='array',
            tickvals=list(angle_mapping.values()),
            ticktext=month_names
        )
    ),
    template="plotly_dark"
)
fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\polarplot.html')
fig.show()
In [ ]:
# Pivoting the data
pivoted_data = pd.pivot_table(galicia_total_pollutant, index='month', values=['CO', 'TPM', 'PM25', 'TPC', 'NMHC', 'OC', 'CH4', 'SO2', 'BC', 'NOx'], aggfunc='sum')

# Creating traces for each pollutant
pollutant_traces = []
pollutants = pivoted_data.columns
colors = px.colors.qualitative.Pastel  

# Adding CO separately
co_trace = go.Bar(
    y=[month - 0.2 for month in pivoted_data.index],  
    x=pivoted_data['CO'],
    name='CO',
    orientation='h',
    marker=dict(color='orange'),
    hoverinfo='x+y+name',
    width=0.4  # Adjust bar width
)
pollutant_traces.append(co_trace)

# Adding other pollutants, excluding CO
stacked_traces = []
for i, pollutant in enumerate([p for p in pollutants if p != 'CO']):
    trace = go.Bar(
        y=[month + 0.2 for month in pivoted_data.index],  
        x=pivoted_data[pollutant],
        name=pollutant,
        orientation='h',
        marker=dict(color=colors[i % len(colors)]),
        hoverinfo='x+y+name',
        width=0.4  # Adjust bar width
    )
    stacked_traces.append(trace)

# Creating tick intervals at every 25k
max_val = pivoted_data.sum().max()
tick_vals = list(range(0, int(max_val + 25000), 25000))

# Creating the layout with increased height
layout = go.Layout(
    title='Monthly Emissions by Other Pollutants released by Wildfire Incidents in Galicia between 2002-2023',
    barmode='stack',
    xaxis=dict(
        title='Total Emissions in Metric Tons (tonnes)',
        showgrid=True,
        tickvals=tick_vals,
        ticktext=[f'{val // 1000}k' for val in tick_vals],
        tickfont=dict(size=14)
    ),
    yaxis=dict(
        title='Month',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
        tickfont=dict(size=14)
    ),
    legend=dict(
        title=dict(
            text='Pollutants',
            font=dict(size=16)
        ),
        x=1.05,
        y=1,
        font=dict(size=12)
    ),
    template='plotly_white',
    height=800  
)

# Creating the figure
fig = go.Figure(data=pollutant_traces + stacked_traces, layout=layout)

fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\horizontal_other_polutants.html')
# Showing the interactive plot
fig.show()
In [ ]:
# Converting 'datetime' to a date to group by day
filtered_galicia_fires_00_22['acq_date'] = filtered_galicia_fires_00_22['datetime'].dt.date

# Ensuring that 'acq_date' is a proper DateTimeIndex for the grouping to work
filtered_galicia_fires_00_22.set_index('acq_date', inplace=True)

# Grouping by 'acq_date' and count the number of wildfires
wildfire_count = filtered_galicia_fires_00_22.groupby(filtered_galicia_fires_00_22.index).size()

# Converting the index to a DatetimeIndex
wildfire_count.index = pd.to_datetime(wildfire_count.index)

# Creating a custom colormap based on the number of wildfires
cmap_colors = ['lightsalmon','lightsalmon','lightsalmon','lightsalmon','lightsalmon',
               'salmon','salmon','salmon','salmon','salmon',
               'tomato','tomato','tomato','tomato','tomato',
               'red','red','red','red','red',
               'crimson','crimson','crimson','crimson','crimson',
               'firebrick','firebrick','firebrick','firebrick','firebrick',
               'brown','brown','brown','brown','brown',
               'darkred','darkred','darkred','darkred','darkred',
               'maroon','maroon','maroon','maroon','maroon',
               'black','black','black','black','black',]
custom_cmap = ListedColormap(cmap_colors)

# Plotting the calendar plot
calplot.calplot(wildfire_count, cmap=custom_cmap, vmin=0, vmax=len(cmap_colors), edgecolor='white', linewidth=0.5)

# Adding title with y offset
plt.suptitle('Calendar Plot Showing the Frequency of Daily Wildfire Incidents (2001-2022)', fontsize=16,x=0.45, y=1.0)

# Adjusting layout
plt.tight_layout(rect=[0, 0, 0.80, 0.99])

# Showing the plot
plt.show()
findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.
C:\Users\45502\AppData\Local\Temp\ipykernel_43264\1654487565.py:33: UserWarning:

This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.

No description has been provided for this image
In [ ]:
# Ensuring that the common columns are in the same dtype
common_columns = ['year', 'season', 'month', 'week', 'day_of_week', 'hour','day_of_month', 'day_of_year']

# Converting the data types for the common columns to match between both datasets
frp_firedata01_22 = frp_firedata01_22.astype({col: 'int64' for col in common_columns})
galicia_weather = galicia_weather.astype({col: 'int64' for col in common_columns})

# Performing the merge
merged_hourly_weather_frp_data = pd.merge(
    frp_firedata01_22, 
    galicia_weather, 
    on=common_columns, 
    how='left'  # This will include all rows from firedata and fill missing values from galicia_weather
)

# Displaying the first few rows of the merged data
merged_hourly_weather_frp_data.head()
Out[ ]:
latitude longitude acq_date acq_time confidence frp datetime year season month ... soil_moisture_7_to_28cm (m³/m³) soil_moisture_28_to_100cm (m³/m³) soil_moisture_100_to_255cm (m³/m³) is_day () sunshine_duration (s) shortwave_radiation_instant (W/m²) direct_radiation_instant (W/m²) diffuse_radiation_instant (W/m²) direct_normal_irradiance_instant (W/m²) terrestrial_radiation_instant (W/m²)
0 42.5118 -8.4374 2001-02-17 11:54:00 36 5.6 2001-02-17 11:54:00 2001 1 2 ... 0.379 0.399 0.422 1 3600.0 169.4 89.9 79.5 349.9 359.8
1 42.2953 -8.2946 2001-02-19 11:42:00 60 8.7 2001-02-19 11:42:00 2001 1 2 ... 0.370 0.392 0.421 1 3600.0 224.7 160.9 63.8 604.1 372.9
2 42.2688 -8.2864 2001-02-19 22:48:00 83 16.2 2001-02-19 22:48:00 2001 1 2 ... 0.367 0.391 0.420 0 0.0 0.0 0.0 0.0 0.0 0.0
3 42.2428 -6.8630 2001-02-21 11:30:00 71 15.2 2001-02-21 11:30:00 2001 1 2 ... 0.363 0.387 0.419 1 3600.0 232.8 165.7 67.1 600.0 386.2
4 42.2881 -8.3451 2001-02-21 11:30:00 77 20.2 2001-02-21 11:30:00 2001 1 2 ... 0.363 0.387 0.419 1 3600.0 232.8 165.7 67.1 600.0 386.2

5 rows × 42 columns

In [ ]:
merged_hourly_weather_frp_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 22277 entries, 0 to 22276
Data columns (total 42 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   latitude                                 22277 non-null  float64       
 1   longitude                                22277 non-null  float64       
 2   acq_date                                 22277 non-null  datetime64[ns]
 3   acq_time                                 22277 non-null  object        
 4   confidence                               22277 non-null  int64         
 5   frp                                      22277 non-null  float64       
 6   datetime                                 22277 non-null  datetime64[ns]
 7   year                                     22277 non-null  int64         
 8   season                                   22277 non-null  int64         
 9   month                                    22277 non-null  int64         
 10  week                                     22277 non-null  int64         
 11  day_of_week                              22277 non-null  int64         
 12  hour                                     22277 non-null  int64         
 13  day_of_month                             22277 non-null  int64         
 14  day_of_year                              22277 non-null  int64         
 15  time                                     22277 non-null  datetime64[ns]
 16  temperature_2m (°C)                      22277 non-null  float64       
 17  relative_humidity_2m (%)                 22277 non-null  int64         
 18  dew_point_2m (°C)                        22277 non-null  float64       
 19  precipitation (mm)                       22277 non-null  float64       
 20  pressure_msl (hPa)                       22277 non-null  float64       
 21  surface_pressure (hPa)                   22277 non-null  float64       
 22  cloud_cover (%)                          22277 non-null  int64         
 23  et0_fao_evapotranspiration (mm)          22277 non-null  float64       
 24  vapour_pressure_deficit (kPa)            22277 non-null  float64       
 25  wind_speed_10m (km/h)                    22277 non-null  float64       
 26  wind_gusts_10m (km/h)                    22277 non-null  float64       
 27  soil_temperature_0_to_7cm (°C)           22277 non-null  float64       
 28  soil_temperature_7_to_28cm (°C)          22277 non-null  float64       
 29  soil_temperature_28_to_100cm (°C)        22277 non-null  float64       
 30  soil_temperature_100_to_255cm (°C)       22277 non-null  float64       
 31  soil_moisture_0_to_7cm (m³/m³)           22277 non-null  float64       
 32  soil_moisture_7_to_28cm (m³/m³)          22277 non-null  float64       
 33  soil_moisture_28_to_100cm (m³/m³)        22277 non-null  float64       
 34  soil_moisture_100_to_255cm (m³/m³)       22277 non-null  float64       
 35  is_day ()                                22277 non-null  int64         
 36  sunshine_duration (s)                    22277 non-null  float64       
 37  shortwave_radiation_instant (W/m²)       22277 non-null  float64       
 38  direct_radiation_instant (W/m²)          22277 non-null  float64       
 39  diffuse_radiation_instant (W/m²)         22277 non-null  float64       
 40  direct_normal_irradiance_instant (W/m²)  22277 non-null  float64       
 41  terrestrial_radiation_instant (W/m²)     22277 non-null  float64       
dtypes: datetime64[ns](3), float64(26), int64(12), object(1)
memory usage: 7.3+ MB
In [ ]:
# Further filtering the data to keep only rows where confidence_category is 'h'
high_confidence_merged_hourly_weather_frp_data = merged_hourly_weather_frp_data[merged_hourly_weather_frp_data['confidence'] > 90]
In [ ]:
# Grouping firedata by year, day_of_year, and hour to get fire counts
fire_grouped = frp_firedata01_22.groupby(['year', 'day_of_year', 'hour']).size().reset_index(name='fire_count')

# Merging the fire counts with galicia_weather
galicia_weather_firecount_merged = pd.merge(galicia_weather, fire_grouped, on=['year', 'day_of_year', 'hour'], how='left')

# Replacing NaN with 0, since NaN indicates no fire incidents at that time
galicia_weather_firecount_merged['fire_count'] = galicia_weather_firecount_merged['fire_count'].fillna(0).astype(int)

# Displaying the first few rows of the new dataset
galicia_weather_firecount_merged.head()
Out[ ]:
time temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) vapour_pressure_deficit (kPa) ... terrestrial_radiation_instant (W/m²) year season month week day_of_week hour day_of_month day_of_year fire_count
0 2000-01-01 00:00:00 1.9 83 -0.7 0.0 1029.0 963.6 29 0.0 0.12 ... 0.0 2000 1 1 52 6 1 1 1 0
1 2000-01-01 01:00:00 1.3 85 -0.9 0.0 1029.2 963.6 24 0.0 0.10 ... 0.0 2000 1 1 52 6 2 1 1 0
2 2000-01-01 02:00:00 -0.1 89 -1.7 0.0 1029.3 963.4 24 0.0 0.07 ... 0.0 2000 1 1 52 6 3 1 1 0
3 2000-01-01 03:00:00 -1.7 92 -2.9 0.0 1028.8 962.6 16 0.0 0.05 ... 0.0 2000 1 1 52 6 4 1 1 0
4 2000-01-01 04:00:00 -2.2 92 -3.3 0.0 1028.7 962.4 8 0.0 0.04 ... 0.0 2000 1 1 52 6 5 1 1 0

5 rows × 36 columns

In [ ]:
# Filtering out rows with fire_count equal to 0
galicia_weather_nonzero_fire = galicia_weather_firecount_merged[galicia_weather_firecount_merged['fire_count'] != 0]
In [ ]:
# Ensuring that the common columns are in the same dtype
common_columns = ['year', 'season', 'month', 'week', 'day_of_week', 'hour','day_of_month', 'day_of_year']

# Converting the data types for the common columns to match between both datasets
galicia_burned_area_byfires_03_18 = galicia_burned_area_byfires_03_18.astype({col: 'int64' for col in common_columns})
galicia_weather = galicia_weather.astype({col: 'int64' for col in common_columns})

# Performing the merge
merged_weather_burntarea_data = pd.merge(
    galicia_burned_area_byfires_03_18, 
    galicia_weather, 
    on=common_columns, 
    how='left'  # This will include all rows from firedata and fill missing values from galicia_weather
)

# Displaying the first few rows of the merged data
merged_weather_burntarea_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 72757 entries, 0 to 72756
Data columns (total 40 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   deteccion                                72757 non-null  datetime64[ns]
 1   idprovincia                              72757 non-null  int64         
 2   burnt_area                               72757 non-null  float64       
 3   latitude                                 72575 non-null  float64       
 4   longitude                                72575 non-null  float64       
 5   year                                     72757 non-null  int64         
 6   season                                   72757 non-null  int64         
 7   month                                    72757 non-null  int64         
 8   week                                     72757 non-null  int64         
 9   day_of_week                              72757 non-null  int64         
 10  hour                                     72757 non-null  int64         
 11  day_of_month                             72757 non-null  int64         
 12  day_of_year                              72757 non-null  int64         
 13  time                                     72757 non-null  datetime64[ns]
 14  temperature_2m (°C)                      72757 non-null  float64       
 15  relative_humidity_2m (%)                 72757 non-null  int64         
 16  dew_point_2m (°C)                        72757 non-null  float64       
 17  precipitation (mm)                       72757 non-null  float64       
 18  pressure_msl (hPa)                       72757 non-null  float64       
 19  surface_pressure (hPa)                   72757 non-null  float64       
 20  cloud_cover (%)                          72757 non-null  int64         
 21  et0_fao_evapotranspiration (mm)          72757 non-null  float64       
 22  vapour_pressure_deficit (kPa)            72757 non-null  float64       
 23  wind_speed_10m (km/h)                    72757 non-null  float64       
 24  wind_gusts_10m (km/h)                    72757 non-null  float64       
 25  soil_temperature_0_to_7cm (°C)           72757 non-null  float64       
 26  soil_temperature_7_to_28cm (°C)          72757 non-null  float64       
 27  soil_temperature_28_to_100cm (°C)        72757 non-null  float64       
 28  soil_temperature_100_to_255cm (°C)       72757 non-null  float64       
 29  soil_moisture_0_to_7cm (m³/m³)           72757 non-null  float64       
 30  soil_moisture_7_to_28cm (m³/m³)          72757 non-null  float64       
 31  soil_moisture_28_to_100cm (m³/m³)        72757 non-null  float64       
 32  soil_moisture_100_to_255cm (m³/m³)       72757 non-null  float64       
 33  is_day ()                                72757 non-null  int64         
 34  sunshine_duration (s)                    72757 non-null  float64       
 35  shortwave_radiation_instant (W/m²)       72757 non-null  float64       
 36  direct_radiation_instant (W/m²)          72757 non-null  float64       
 37  diffuse_radiation_instant (W/m²)         72757 non-null  float64       
 38  direct_normal_irradiance_instant (W/m²)  72757 non-null  float64       
 39  terrestrial_radiation_instant (W/m²)     72757 non-null  float64       
dtypes: datetime64[ns](2), float64(26), int64(12)
memory usage: 22.8 MB
In [ ]:
# Extracting the week number and year to group by
galicia_weather['year_week'] = galicia_weather['time'].dt.strftime('%Y-%U')

# Aggregating data to weekly, taking the mean for each week
weekly_weather_data = galicia_weather.groupby('year_week').mean().reset_index()

# Spliting the 'year_week' column into separate 'year' and 'week' columns
weekly_weather_data[['year', 'week']] = weekly_weather_data['year_week'].str.split('-', expand=True).astype(int)

# Droping the 'year_week' column
weekly_weather_data.drop(columns=['year_week'], inplace=True)

# Displaying the first few rows of the weekly aggregated data
weekly_weather_data.head()
Out[ ]:
temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) vapour_pressure_deficit (kPa) wind_speed_10m (km/h) ... direct_normal_irradiance_instant (W/m²) terrestrial_radiation_instant (W/m²) year season month week day_of_week hour day_of_month day_of_year
0 1.945833 81.166667 -1.100000 0.000000 1028.962500 963.566667 7.166667 0.034167 0.157500 4.691667 ... 231.166667 139.258333 2000 1.0 1.0 0 6.0 12.5 1.0 1.0
1 5.446429 84.023810 2.852381 0.016071 1025.641071 961.237500 55.160714 0.034643 0.162976 8.160119 ... 140.266667 142.637500 2000 1.0 1.0 1 4.0 12.5 5.0 5.0
2 2.594048 85.452381 0.247619 0.183929 1024.692857 959.710714 42.285714 0.029524 0.118274 12.727381 ... 156.902976 150.438690 2000 1.0 1.0 2 4.0 12.5 12.0 12.0
3 2.028571 84.505952 -0.460714 0.000000 1027.607738 962.308929 13.339286 0.041310 0.131071 10.794048 ... 255.066667 160.467857 2000 1.0 1.0 3 4.0 12.5 19.0 19.0
4 2.569048 83.976190 -0.035119 0.002976 1024.231548 959.273810 41.148810 0.038929 0.135893 9.242857 ... 181.782738 173.112500 2000 1.0 1.0 4 4.0 12.5 26.0 26.0

5 rows × 34 columns

In [ ]:
# Extracting the ISO calendar year and week to group by
galicia_weather['year'] = galicia_weather['time'].dt.isocalendar().year
galicia_weather['week'] = galicia_weather['time'].dt.isocalendar().week

# Grouping by the ISO calendar year and week and taking the mean for each week
weekly_weather_data = galicia_weather.groupby(['year', 'week']).mean().reset_index()

# Display the first few rows of the weekly aggregated data
weekly_weather_data.head()
Out[ ]:
year week temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) ... direct_radiation_instant (W/m²) diffuse_radiation_instant (W/m²) direct_normal_irradiance_instant (W/m²) terrestrial_radiation_instant (W/m²) season month day_of_week hour day_of_month day_of_year
0 1999 52 2.252083 82.395833 -0.608333 0.000000 1028.852083 963.527083 8.145833 0.034375 ... 70.202083 21.145833 225.227083 139.622917 1.0 1.0 6.5 12.5 1.5 1.5
1 2000 1 5.850595 84.660714 3.364881 0.029167 1025.888095 961.562500 63.571429 0.033929 ... 40.952976 26.702976 125.539881 143.596429 1.0 1.0 4.0 12.5 6.0 6.0
2 2000 2 2.138690 84.767857 -0.310119 0.170833 1023.983929 958.941667 33.559524 0.030952 ... 57.545238 21.491071 176.457738 151.745238 1.0 1.0 4.0 12.5 13.0 13.0
3 2000 3 2.170238 84.922619 -0.255357 0.000000 1027.510119 962.251786 16.196429 0.040952 ... 83.503571 24.485119 247.979762 162.075595 1.0 1.0 4.0 12.5 20.0 20.0
4 2000 4 3.189286 84.494048 0.655952 0.002976 1025.488690 960.589286 44.785714 0.041012 ... 66.744048 30.870238 183.569643 175.187500 1.0 1.0 4.0 12.5 27.0 27.0

5 rows × 34 columns

In [ ]:
weekly_weather_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1271 entries, 0 to 1270
Data columns (total 34 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   year                                     1271 non-null   UInt32 
 1   week                                     1271 non-null   UInt32 
 2   temperature_2m (°C)                      1271 non-null   float64
 3   relative_humidity_2m (%)                 1271 non-null   float64
 4   dew_point_2m (°C)                        1271 non-null   float64
 5   precipitation (mm)                       1271 non-null   float64
 6   pressure_msl (hPa)                       1271 non-null   float64
 7   surface_pressure (hPa)                   1271 non-null   float64
 8   cloud_cover (%)                          1271 non-null   float64
 9   et0_fao_evapotranspiration (mm)          1271 non-null   float64
 10  vapour_pressure_deficit (kPa)            1271 non-null   float64
 11  wind_speed_10m (km/h)                    1271 non-null   float64
 12  wind_gusts_10m (km/h)                    1271 non-null   float64
 13  soil_temperature_0_to_7cm (°C)           1271 non-null   float64
 14  soil_temperature_7_to_28cm (°C)          1271 non-null   float64
 15  soil_temperature_28_to_100cm (°C)        1271 non-null   float64
 16  soil_temperature_100_to_255cm (°C)       1271 non-null   float64
 17  soil_moisture_0_to_7cm (m³/m³)           1271 non-null   float64
 18  soil_moisture_7_to_28cm (m³/m³)          1271 non-null   float64
 19  soil_moisture_28_to_100cm (m³/m³)        1271 non-null   float64
 20  soil_moisture_100_to_255cm (m³/m³)       1271 non-null   float64
 21  is_day ()                                1271 non-null   float64
 22  sunshine_duration (s)                    1271 non-null   float64
 23  shortwave_radiation_instant (W/m²)       1271 non-null   float64
 24  direct_radiation_instant (W/m²)          1271 non-null   float64
 25  diffuse_radiation_instant (W/m²)         1271 non-null   float64
 26  direct_normal_irradiance_instant (W/m²)  1271 non-null   float64
 27  terrestrial_radiation_instant (W/m²)     1271 non-null   float64
 28  season                                   1271 non-null   float64
 29  month                                    1271 non-null   float64
 30  day_of_week                              1271 non-null   float64
 31  hour                                     1271 non-null   float64
 32  day_of_month                             1271 non-null   float64
 33  day_of_year                              1271 non-null   float64
dtypes: UInt32(2), float64(32)
memory usage: 330.3 KB
In [ ]:
print(weekly_weather_data['week'].unique())
<IntegerArray>
[52,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53]
Length: 53, dtype: UInt32
In [ ]:
weekly_fire_alerts = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\modis_fire_alerts__count.csv")
weekly_fire_alerts.head(5)
Out[ ]:
adm2 year week alert__count confidence_category
0 2 2016 32 4 h
1 3 2018 24 1 n
2 3 2023 34 1 n
3 3 2012 2 2 n
4 4 2017 31 1 h
In [ ]:
weekly_fire_alerts.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1012 entries, 0 to 1011
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   adm2                 1012 non-null   int64 
 1   year                 1012 non-null   int64 
 2   week                 1012 non-null   int64 
 3   alert__count         1012 non-null   int64 
 4   confidence_category  1012 non-null   object
dtypes: int64(4), object(1)
memory usage: 39.7+ KB
In [ ]:
# Merging the two datasets on 'year' and 'week' columns
merged_weekly_firealert_weather_data = pd.merge(weekly_weather_data, weekly_fire_alerts, on=['year', 'week'], how='left')
In [ ]:
# Filtering the rows where alert__count is not NaN and greater than zero
filtered_weekly_datav1 = merged_weekly_firealert_weather_data[merged_weekly_firealert_weather_data['alert__count'].notna()
                                                              & (merged_weekly_firealert_weather_data['alert__count'] > 0)]
In [ ]:
filtered_weekly_datav1.head()
Out[ ]:
year week temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) ... terrestrial_radiation_instant (W/m²) season month day_of_week hour day_of_month day_of_year adm2 alert__count confidence_category
628 2012 2 4.217262 87.547619 2.163690 0.052976 1026.578571 961.840476 35.559524 0.035833 ... 150.533333 1.0 1.0 4.0 12.5 12.0 12.0 3.0 2.0 n
630 2012 4 4.857738 86.434524 2.619048 0.014881 1025.511310 960.983929 36.309524 0.041131 ... 173.393452 1.0 1.0 4.0 12.5 26.0 26.0 4.0 3.0 n
631 2012 4 4.857738 86.434524 2.619048 0.014881 1025.511310 960.983929 36.309524 0.041131 ... 173.393452 1.0 1.0 4.0 12.5 26.0 26.0 3.0 2.0 n
632 2012 4 4.857738 86.434524 2.619048 0.014881 1025.511310 960.983929 36.309524 0.041131 ... 173.393452 1.0 1.0 4.0 12.5 26.0 26.0 2.0 1.0 n
633 2012 4 4.857738 86.434524 2.619048 0.014881 1025.511310 960.983929 36.309524 0.041131 ... 173.393452 1.0 1.0 4.0 12.5 26.0 26.0 4.0 2.0 h

5 rows × 37 columns

In [ ]:
# Further filtering the data to keep only rows where confidence_category is 'h'
filtered_weekly_datav2 = filtered_weekly_datav1[filtered_weekly_datav1['confidence_category'] == 'h']
In [ ]:
# Creating custom diverging colormap
cmap_negative = plt.get_cmap('RdBu_r', 128)
cmap_positive = plt.get_cmap('RdBu_r', 128)

# Combining them into a custom diverging colormap
custom_cmap = ListedColormap(np.vstack((cmap_negative(np.linspace(0.5, 1, 128)), cmap_positive(np.linspace(0, 0.5, 128)))))

# Selecting the columns for correlation analysis
columns_of_interest = [
    'temperature_2m (°C)', 'relative_humidity_2m (%)', 'dew_point_2m (°C)', 'precipitation (mm)', 
    'pressure_msl (hPa)', 'surface_pressure (hPa)', 'cloud_cover (%)', 
    'et0_fao_evapotranspiration (mm)', 'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)', 
    'wind_gusts_10m (km/h)', 'soil_temperature_0_to_7cm (°C)', 'soil_temperature_7_to_28cm (°C)', 
    'soil_temperature_28_to_100cm (°C)', 'soil_temperature_100_to_255cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)', 
    'soil_moisture_7_to_28cm (m³/m³)', 'soil_moisture_28_to_100cm (m³/m³)', 'soil_moisture_100_to_255cm (m³/m³)', 
    'sunshine_duration (s)', 'shortwave_radiation_instant (W/m²)', 'direct_radiation_instant (W/m²)', 
    'diffuse_radiation_instant (W/m²)', 'direct_normal_irradiance_instant (W/m²)', 'terrestrial_radiation_instant (W/m²)',
    'alert__count'
]

# Computing the correlation matrix
corr_matrix = filtered_weekly_datav2[columns_of_interest].corr()

# Extracting the last row ('alert__count') correlation
alert_corr = corr_matrix.loc['alert__count']

# Creating the heatmap with seaborn
plt.figure(figsize=(15, 1))
sns.heatmap(alert_corr.values.reshape(1, -1), annot=True, cmap=custom_cmap, fmt='.2f', xticklabels=alert_corr.index, yticklabels=['alert__count'], cbar=False, linewidths=.5, center=0)

# Setting plot title
plt.title('Correlation of # Fire Alert Counts with Weather Parameters')


plt.savefig(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\images\correlation.png')
# Showing the plot
plt.show()
No description has been provided for this image
In [ ]:
filtered_weekly_datav2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 197 entries, 633 to 1974
Data columns (total 37 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   year                                     197 non-null    UInt32 
 1   week                                     197 non-null    UInt32 
 2   temperature_2m (°C)                      197 non-null    float64
 3   relative_humidity_2m (%)                 197 non-null    float64
 4   dew_point_2m (°C)                        197 non-null    float64
 5   precipitation (mm)                       197 non-null    float64
 6   pressure_msl (hPa)                       197 non-null    float64
 7   surface_pressure (hPa)                   197 non-null    float64
 8   cloud_cover (%)                          197 non-null    float64
 9   et0_fao_evapotranspiration (mm)          197 non-null    float64
 10  vapour_pressure_deficit (kPa)            197 non-null    float64
 11  wind_speed_10m (km/h)                    197 non-null    float64
 12  wind_gusts_10m (km/h)                    197 non-null    float64
 13  soil_temperature_0_to_7cm (°C)           197 non-null    float64
 14  soil_temperature_7_to_28cm (°C)          197 non-null    float64
 15  soil_temperature_28_to_100cm (°C)        197 non-null    float64
 16  soil_temperature_100_to_255cm (°C)       197 non-null    float64
 17  soil_moisture_0_to_7cm (m³/m³)           197 non-null    float64
 18  soil_moisture_7_to_28cm (m³/m³)          197 non-null    float64
 19  soil_moisture_28_to_100cm (m³/m³)        197 non-null    float64
 20  soil_moisture_100_to_255cm (m³/m³)       197 non-null    float64
 21  is_day ()                                197 non-null    float64
 22  sunshine_duration (s)                    197 non-null    float64
 23  shortwave_radiation_instant (W/m²)       197 non-null    float64
 24  direct_radiation_instant (W/m²)          197 non-null    float64
 25  diffuse_radiation_instant (W/m²)         197 non-null    float64
 26  direct_normal_irradiance_instant (W/m²)  197 non-null    float64
 27  terrestrial_radiation_instant (W/m²)     197 non-null    float64
 28  season                                   197 non-null    float64
 29  month                                    197 non-null    float64
 30  day_of_week                              197 non-null    float64
 31  hour                                     197 non-null    float64
 32  day_of_month                             197 non-null    float64
 33  day_of_year                              197 non-null    float64
 34  adm2                                     197 non-null    float64
 35  alert__count                             197 non-null    float64
 36  confidence_category                      197 non-null    object 
dtypes: UInt32(2), float64(34), object(1)
memory usage: 57.3+ KB
In [ ]:
filtered_weekly_datav2.head(5)
Out[ ]:
year week temperature_2m (°C) relative_humidity_2m (%) dew_point_2m (°C) precipitation (mm) pressure_msl (hPa) surface_pressure (hPa) cloud_cover (%) et0_fao_evapotranspiration (mm) ... terrestrial_radiation_instant (W/m²) season month day_of_week hour day_of_month day_of_year adm2 alert__count confidence_category
633 2012 4 4.857738 86.434524 2.619048 0.014881 1025.511310 960.983929 36.309524 0.041131 ... 173.393452 1.000000 1.000000 4.0 12.5 26.000000 26.0 4.0 2.0 h
645 2012 8 6.883333 78.547619 2.991667 0.000000 1027.979762 963.739286 23.523810 0.081726 ... 244.268452 1.000000 2.000000 4.0 12.5 23.000000 54.0 3.0 2.0 h
651 2012 9 8.844048 83.238095 5.794048 0.026190 1024.092857 960.533333 48.523810 0.076607 ... 265.011310 1.571429 2.571429 4.0 12.5 13.428571 61.0 1.0 2.0 h
656 2012 9 8.844048 83.238095 5.794048 0.026190 1024.092857 960.533333 48.523810 0.076607 ... 265.011310 1.571429 2.571429 4.0 12.5 13.428571 61.0 4.0 2.0 h
657 2012 9 8.844048 83.238095 5.794048 0.026190 1024.092857 960.533333 48.523810 0.076607 ... 265.011310 1.571429 2.571429 4.0 12.5 13.428571 61.0 3.0 15.0 h

5 rows × 37 columns

In [ ]:
print(filtered_weekly_datav2['alert__count'].unique())
[  2.  15.   3.   6.   7.   1.   5.  18.   8.  44.   4.   9.  42.  12.
  34.  31.  57.  10.  62.  68.  16.  43.  21.  70. 112.  64.  14.  30.]

Machine Learning Part¶

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import xgboost as xgb
In [ ]:
# MLStep 1: Data Preprocessing
# Assuming `filtered_weekly_datav2` is already loaded
df = filtered_weekly_datav2.copy()
In [ ]:
# Remove any missing values
df.dropna(inplace=True)
In [ ]:
# ML Step 2: Defining Features and Target
# Defining the target variable: 1 if 'alert__count' > 10, else 0
df['high_alert'] = (df['alert__count'] > 10).astype(int)
In [ ]:
# Selecting relevant features
features = ['season', 'month', 'week', 'temperature_2m (°C)', 'relative_humidity_2m (%)', 
            'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
            'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)', 
            'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)', 
            'direct_normal_irradiance_instant (W/m²)']
In [ ]:
X = df[features]
y = df['high_alert']
In [ ]:
# ML Step 3: Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
In [ ]:
# ML Step 4: Training XGBoost Model
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)
Out[ ]:
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, device=None, early_stopping_rounds=None,
              enable_categorical=False, eval_metric=None, feature_types=None,
              gamma=None, grow_policy=None, importance_type=None,
              interaction_constraints=None, learning_rate=None, max_bin=None,
              max_cat_threshold=None, max_cat_to_onehot=None,
              max_delta_step=None, max_depth=None, max_leaves=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              multi_strategy=None, n_estimators=None, n_jobs=None,
              num_parallel_tree=None, random_state=None, ...)
In [ ]:
# ML Step 5: Model Evaluation
y_pred = xgb_model.predict(X_test)
In [ ]:
# Confusion Matrix and Classification Report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
[[49  4]
 [ 4  3]]
              precision    recall  f1-score   support

           0       0.92      0.92      0.92        53
           1       0.43      0.43      0.43         7

    accuracy                           0.87        60
   macro avg       0.68      0.68      0.68        60
weighted avg       0.87      0.87      0.87        60

In [ ]:
from sklearn.inspection import permutation_importance

# Calculating permutation importance
perm_importance = permutation_importance(xgb_model, X_test, y_test, n_repeats=10, random_state=42)

# Printing feature importances
feature_importance = pd.Series(perm_importance.importances_mean, index=X_test.columns)
print("Permutation Feature Importance:")
print(feature_importance.sort_values(ascending=False))
Permutation Feature Importance:
wind_speed_10m (km/h)                      0.015000
vapour_pressure_deficit (kPa)              0.005000
month                                      0.000000
week                                       0.000000
direct_normal_irradiance_instant (W/m²)    0.000000
precipitation (mm)                        -0.005000
soil_temperature_0_to_7cm (°C)            -0.008333
relative_humidity_2m (%)                  -0.008333
et0_fao_evapotranspiration (mm)           -0.010000
temperature_2m (°C)                       -0.011667
season                                    -0.013333
soil_moisture_0_to_7cm (m³/m³)            -0.023333
dtype: float64
In [ ]:
from sklearn.inspection import plot_partial_dependence

# Ploting partial dependence for key features
key_features = ['temperature_2m (°C)', 'precipitation (mm)', 'relative_humidity_2m (%)']
plot_partial_dependence(xgb_model, X_test, key_features)
plt.show()
C:\Users\45502\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87: FutureWarning:

Function plot_partial_dependence is deprecated; Function `plot_partial_dependence` is deprecated in 1.0 and will be removed in 1.2. Use PartialDependenceDisplay.from_estimator instead

No description has been provided for this image
In [ ]:
high_fire = df[df['high_alert'] == 1][features]
print("High Fire Counts Feature Means:")
print(high_fire.mean())
High Fire Counts Feature Means:
season                                       3.308571
month                                        8.102857
week                                        32.960000
temperature_2m (°C)                         19.103929
relative_humidity_2m (%)                    70.141190
precipitation (mm)                           0.033500
et0_fao_evapotranspiration (mm)              0.178805
vapour_pressure_deficit (kPa)                0.835679
wind_speed_10m (km/h)                       12.126881
soil_temperature_0_to_7cm (°C)              20.664310
soil_moisture_0_to_7cm (m³/m³)               0.141064
direct_normal_irradiance_instant (W/m²)    300.593048
dtype: float64
In [ ]:
print(high_fire.std())
season                                      0.601416
month                                       1.517203
week                                        6.592167
temperature_2m (°C)                         3.515758
relative_humidity_2m (%)                    9.240537
precipitation (mm)                          0.067844
et0_fao_evapotranspiration (mm)             0.051497
vapour_pressure_deficit (kPa)               0.348321
wind_speed_10m (km/h)                       2.623547
soil_temperature_0_to_7cm (°C)              3.813663
soil_moisture_0_to_7cm (m³/m³)              0.061367
direct_normal_irradiance_instant (W/m²)    79.538858
dtype: float64
In [ ]:
low_fire = df[df['high_alert'] == 0][features]
print("Low Fire Counts Feature Means:")
print(low_fire.mean())
Low Fire Counts Feature Means:
season                                       2.882890
month                                        6.745017
week                                        27.523256
temperature_2m (°C)                         15.577357
relative_humidity_2m (%)                    74.007821
precipitation (mm)                           0.039708
et0_fao_evapotranspiration (mm)              0.150774
vapour_pressure_deficit (kPa)                0.597784
wind_speed_10m (km/h)                       11.970449
soil_temperature_0_to_7cm (°C)              17.142701
soil_moisture_0_to_7cm (m³/m³)               0.182801
direct_normal_irradiance_instant (W/m²)    281.338611
dtype: float64
In [ ]:
print(low_fire.std())
season                                      0.864864
month                                       2.517798
week                                       10.920475
temperature_2m (°C)                         4.535791
relative_humidity_2m (%)                    5.887160
precipitation (mm)                          0.071569
et0_fao_evapotranspiration (mm)             0.048418
vapour_pressure_deficit (kPa)               0.238383
wind_speed_10m (km/h)                       2.717183
soil_temperature_0_to_7cm (°C)              5.143733
soil_moisture_0_to_7cm (m³/m³)              0.083412
direct_normal_irradiance_instant (W/m²)    72.368410
dtype: float64
In [ ]:
corr = df.corr()
print("Correlation with High Alert Counts:")
print(corr['high_alert'].sort_values(ascending=False))
Correlation with High Alert Counts:
high_alert                                 1.000000
alert__count                               0.783563
vapour_pressure_deficit (kPa)              0.298493
temperature_2m (°C)                        0.257748
soil_temperature_0_to_7cm (°C)             0.229397
soil_temperature_7_to_28cm (°C)            0.227418
soil_temperature_28_to_100cm (°C)          0.211036
dew_point_2m (°C)                          0.193162
et0_fao_evapotranspiration (mm)            0.188695
soil_temperature_100_to_255cm (°C)         0.188130
month                                      0.184714
day_of_year                                0.175255
week                                       0.170935
season                                     0.167757
year                                       0.130221
direct_radiation_instant (W/m²)            0.097938
sunshine_duration (s)                      0.087851
direct_normal_irradiance_instant (W/m²)    0.087560
adm2                                       0.080088
shortwave_radiation_instant (W/m²)         0.078600
terrestrial_radiation_instant (W/m²)       0.052186
wind_gusts_10m (km/h)                      0.022008
wind_speed_10m (km/h)                      0.019339
is_day ()                                  0.017811
cloud_cover (%)                           -0.017529
precipitation (mm)                        -0.029191
diffuse_radiation_instant (W/m²)          -0.047867
surface_pressure (hPa)                    -0.068558
pressure_msl (hPa)                        -0.127857
day_of_month                              -0.139768
soil_moisture_0_to_7cm (m³/m³)            -0.169841
soil_moisture_7_to_28cm (m³/m³)           -0.186412
relative_humidity_2m (%)                  -0.198260
soil_moisture_28_to_100cm (m³/m³)         -0.201943
soil_moisture_100_to_255cm (m³/m³)        -0.244854
day_of_week                                     NaN
hour                                            NaN
Name: high_alert, dtype: float64
In [ ]:
import matplotlib.pyplot as plt

# Listing of features to analyze
boxplotfeatures = ['temperature_2m (°C)', 'relative_humidity_2m (%)', 
            'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
            'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)', 
            'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)', 
            'direct_normal_irradiance_instant (W/m²)']

# Calculating and print summary statistics
summary_stats = high_fire[boxplotfeatures].describe().T[['min', '25%', '50%', '75%', 'max']]
print(summary_stats)

# Creating 3x3 subplots for box plots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))

# Iterating over features and corresponding subplot axes
for i, feature in enumerate(boxplotfeatures):
    ax = axes[i//3, i%3]  
    ax.boxplot(high_fire[feature].dropna())  
    ax.set_title(feature)
    ax.set_ylabel('Values')
    ax.set_xticks([])  

# Removing empty subplots
for i in range(len(boxplotfeatures), len(axes.flatten())):
    fig.delaxes(axes.flatten()[i])

plt.tight_layout()
plt.show()
                                               min         25%         50%  \
temperature_2m (°C)                       8.844048   17.812500   19.305357   
relative_humidity_2m (%)                 52.791667   63.440476   73.517857   
precipitation (mm)                        0.000000    0.000000    0.003571   
et0_fao_evapotranspiration (mm)           0.057083    0.155536    0.191131   
vapour_pressure_deficit (kPa)             0.244524    0.631190    0.758095   
wind_speed_10m (km/h)                     7.463095   10.320238   12.439286   
soil_temperature_0_to_7cm (°C)            8.925000   18.913690   20.604167   
soil_moisture_0_to_7cm (m³/m³)            0.097875    0.107679    0.113750   
direct_normal_irradiance_instant (W/m²)  92.613095  287.964286  328.675000   

                                                75%         max  
temperature_2m (°C)                       20.339881   26.552976  
relative_humidity_2m (%)                  76.083333   83.541667  
precipitation (mm)                         0.022619    0.285119  
et0_fao_evapotranspiration (mm)            0.210774    0.264643  
vapour_pressure_deficit (kPa)              1.060536    1.633810  
wind_speed_10m (km/h)                     14.095833   16.160714  
soil_temperature_0_to_7cm (°C)            23.627381   26.592262  
soil_moisture_0_to_7cm (m³/m³)             0.136482    0.294196  
direct_normal_irradiance_instant (W/m²)  346.464881  402.494643  
No description has been provided for this image
In [ ]:
# Listing of features to analyze
boxplotfeatures = ['temperature_2m (°C)', 'relative_humidity_2m (%)', 
                   'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
                   'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)', 
                   'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)', 
                   'direct_normal_irradiance_instant (W/m²)']

# Calculating and print summary statistics
summary_stats = high_fire[boxplotfeatures].describe().T[['min', '25%', '50%', '75%', 'max']]
print(summary_stats)

# Creating 3x3 subplots for box plots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))

# Listing of colors for each boxplot
colors = ['#FF9999', '#66B3FF', '#99FF99', '#FFCC99', '#FF6666', '#66CC99', '#FFCC66', '#6666FF', '#CC99FF']

# Iterating over features and corresponding subplot axes
for i, feature in enumerate(boxplotfeatures):
    ax = axes[i // 3, i % 3]  
    bp = ax.boxplot(high_fire[feature].dropna(), patch_artist=True, notch=True, boxprops=dict(facecolor=colors[i], color=colors[i]), 
                    whiskerprops=dict(color=colors[i], linewidth=2), capprops=dict(color=colors[i], linewidth=2),
                    medianprops=dict(color='black', linewidth=2))

    ax.set_title(feature)
    ax.set_ylabel('Values')
    ax.set_xticks([])  
    ax.grid(True, linestyle='--', alpha=0.7)  

# Removing empty subplots
for i in range(len(boxplotfeatures), len(axes.flatten())):
    fig.delaxes(axes.flatten()[i])

plt.tight_layout()
plt.show()
                                               min         25%         50%  \
temperature_2m (°C)                       8.844048   17.812500   19.305357   
relative_humidity_2m (%)                 52.791667   63.440476   73.517857   
precipitation (mm)                        0.000000    0.000000    0.003571   
et0_fao_evapotranspiration (mm)           0.057083    0.155536    0.191131   
vapour_pressure_deficit (kPa)             0.244524    0.631190    0.758095   
wind_speed_10m (km/h)                     7.463095   10.320238   12.439286   
soil_temperature_0_to_7cm (°C)            8.925000   18.913690   20.604167   
soil_moisture_0_to_7cm (m³/m³)            0.097875    0.107679    0.113750   
direct_normal_irradiance_instant (W/m²)  92.613095  287.964286  328.675000   

                                                75%         max  
temperature_2m (°C)                       20.339881   26.552976  
relative_humidity_2m (%)                  76.083333   83.541667  
precipitation (mm)                         0.022619    0.285119  
et0_fao_evapotranspiration (mm)            0.210774    0.264643  
vapour_pressure_deficit (kPa)              1.060536    1.633810  
wind_speed_10m (km/h)                     14.095833   16.160714  
soil_temperature_0_to_7cm (°C)            23.627381   26.592262  
soil_moisture_0_to_7cm (m³/m³)             0.136482    0.294196  
direct_normal_irradiance_instant (W/m²)  346.464881  402.494643  
No description has been provided for this image
In [ ]:
# Loading the weather forecast dataset for May 2024
weatherforecasts = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\weatherforecasts.csv", encoding='ISO-8859-1')
In [ ]:
# Converting 'time' column to datetime and set it as the index
weatherforecasts['time'] = pd.to_datetime(weatherforecasts['time'])
weatherforecasts.set_index('time', inplace=True)
In [ ]:
# Defining features and their thresholds
features = {
    'temperature_2m (°C)': ('above', 19.103929),
    'relative_humidity_2m (%)': ('below', 70.141190),
    'et0_fao_evapotranspiration (mm)': ('above', 0.178805),
    'vapour_pressure_deficit (kPa)': ('above', 0.835679),
    'wind_speed_10m (km/h)': ('above', 12.126881),
    'soil_temperature_0cm (°C)': ('above', 20.664310),
    'soil_moisture_0_to_1cm (m³/m³)': ('below', 0.141064),
    'direct_normal_irradiance_instant (W/m²)': ('above', 300.593048)
}

# Setting up the subplots
fig, axes = plt.subplots(len(features), 1, figsize=(15, len(features) * 3))

# Plotting each feature in a separate subplot
for i, (feature, (condition, threshold)) in enumerate(features.items()):
    ax = axes[i]
    ax.plot(weatherforecasts.index, weatherforecasts[feature], label=feature, color='blue')
    
    # Adding the threshold zone
    if condition == 'above':
        ax.axhspan(threshold, weatherforecasts[feature].max(), color='red', alpha=0.3)
    else:
        ax.axhspan(weatherforecasts[feature].min(), threshold, color='red', alpha=0.3)
    
    ax.set_title(f'{feature}')
    ax.set_ylabel('Values')
    ax.legend()

    # Customizing the x-axis
    ax.set_xticks(weatherforecasts.index[::24])  
    ax.set_xticklabels(weatherforecasts.index[::24].strftime('%Y-%m-%d'), rotation=45, ha='right')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Defining features and their thresholds
features = {
    'temperature_2m (°C)': ('above', 19.103929),
    'relative_humidity_2m (%)': ('below', 70.141190),
    'et0_fao_evapotranspiration (mm)': ('above', 0.178805),
    'vapour_pressure_deficit (kPa)': ('above', 0.835679),
    'wind_speed_10m (km/h)': ('above', 12.126881),
    'soil_temperature_0cm (°C)': ('above', 20.664310),
    'soil_moisture_0_to_1cm (m³/m³)': ('below', 0.141064),
    'direct_normal_irradiance_instant (W/m²)': ('above', 300.593048)
}

# Setting up subplots
fig = make_subplots(rows=len(features), cols=1, subplot_titles=list(features.keys()))

# Plotting each feature in a separate subplot
for i, (feature, (condition, threshold)) in enumerate(features.items(), start=1):
    # Add line plot
    fig.add_trace(go.Scatter(
        x=weatherforecasts.index,
        y=weatherforecasts[feature],
        mode='lines',
        name=feature,
        hoverinfo='x+y',
        line=dict(color='blue')
    ), row=i, col=1)

    # Adding the threshold zone
    if condition == 'above':
        fig.add_shape(
            type='rect',
            x0=weatherforecasts.index.min(),
            x1=weatherforecasts.index.max(),
            y0=threshold,
            y1=weatherforecasts[feature].max(),
            fillcolor='red',
            opacity=0.3,
            line_width=0,
            row=i, col=1
        )
    else:
        fig.add_shape(
            type='rect',
            x0=weatherforecasts.index.min(),
            x1=weatherforecasts.index.max(),
            y0=weatherforecasts[feature].min(),
            y1=threshold,
            fillcolor='red',
            opacity=0.3,
            line_width=0,
            row=i, col=1
        )

    # Updating x-axis for each subplot
    fig.update_xaxes(tickmode='array', tickvals=weatherforecasts.index[::24], ticktext=weatherforecasts.index[::24].strftime('%Y-%m-%d'), tickangle=45, row=i, col=1)

    # Updating y-axis for each subplot
    fig.update_yaxes(showgrid=True, tickmode='auto', nticks=15, row=i, col=1)

# Updating layout
fig.update_layout(height=len(features) * 300, showlegend=True, hovermode='x unified')

fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\timeseries.html')

# Showing plot
fig.show()
In [ ]:
# Function to create polygons from corner coordinates
def create_polygon(row):
    points = [(row[f'lon-{i}'], row[f'lat-{i}']) for i in range(1, 5)]
    return Polygon(points)

# Filtering data within the latitude and longitude range
def filter_data(data):
    lon_filtered = data.loc[
        (data['lon-1'] >= -9.3) & (data['lon-1'] <= -6.73) &
        (data['lon-2'] >= -9.3) & (data['lon-2'] <= -6.73) &
        (data['lon-3'] >= -9.3) & (data['lon-3'] <= -6.73) &
        (data['lon-4'] >= -9.3) & (data['lon-4'] <= -6.73)
    ]
    
    lat_filtered = lon_filtered.loc[
        (lon_filtered['lat-1'] >= 41.8) & (lon_filtered['lat-1'] <= 43.8) &
        (lon_filtered['lat-2'] >= 41.8) & (lon_filtered['lat-2'] <= 43.8) &
        (lon_filtered['lat-3'] >= 41.8) & (lon_filtered['lat-3'] <= 43.8) &
        (lon_filtered['lat-4'] >= 41.8) & (lon_filtered['lat-4'] <= 43.8)
    ]
    
    return lat_filtered

irre_data = filter_data(irre_data)
fire_data = filter_data(fire_data)

# Creating GeoDataFrames
irre_data['geometry'] = irre_data.apply(create_polygon, axis=1)
fire_data['geometry'] = fire_data.apply(create_polygon, axis=1)

gdf_irre = gpd.GeoDataFrame(irre_data, geometry='geometry')
gdf_fire = gpd.GeoDataFrame(fire_data, geometry='geometry')

# Function to plot choropleth map
def plot_choropleth(gdf, column, title, color_scale='Purples'):
    gdf['center'] = gdf['geometry'].centroid
    gdf['lon'] = gdf['center'].x
    gdf['lat'] = gdf['center'].y
    
    fig = px.choropleth_mapbox(gdf, geojson=gdf.geometry.__geo_interface__, 
                               locations=gdf.index, color=column,
                               mapbox_style="carto-positron", center={"lat": 42.7, "lon": -8.015},
                               zoom=6.5, opacity=0.6, color_continuous_scale=color_scale)
    
    fig.update_geos(fitbounds="locations", visible=False)
    fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, title=title)
    fig.show()

# Plotting Irreplaceability-score_rank map
plot_choropleth(gdf_irre, 'Irreplaceability-score_rank', 'Irreplaceability Score Rank')

# Plotting Aggregated-fire-risk map using the 'Reds' palette
plot_choropleth(gdf_fire, 'Aggregated-fire-risk', 'Aggregated Fire Risk', color_scale='Reds')
In [ ]:
# Function to plot choropleth map and export to HTML
def plot_choropleth_and_export(gdf, column, title, output_file, color_scale='Purples'):
    gdf['center'] = gdf['geometry'].centroid
    gdf['lon'] = gdf['center'].x
    gdf['lat'] = gdf['center'].y
    
    fig = px.choropleth_mapbox(
        gdf, 
        geojson=gdf.geometry.__geo_interface__,
        locations=gdf.index,
        color=column,
        mapbox_style="carto-positron",
        center={"lat": 42.7, "lon": -8.015},
        zoom=6.5,
        opacity=0.6,
        color_continuous_scale=color_scale
    )
    
    fig.update_geos(fitbounds="locations", visible=False)
    fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, title=title)
    
    # Saving to HTML
    fig.write_html(output_file)

# Plotting and export Irreplaceability-score_rank map
plot_choropleth_and_export(
    gdf_irre, 'Irreplaceability-score_rank',
    'Irreplaceability Score Rank', 'Irreplaceability_score_rank_map.html'
)

# Plotting and export Aggregated-fire-risk map using the 'Reds' palette
plot_choropleth_and_export(
    gdf_fire, 'Aggregated-fire-risk',
    'Aggregated Fire Risk', 'Aggregated_fire_risk_map.html', color_scale='Reds'
)